[
https://issues.apache.org/jira/browse/TIKA-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15488398#comment-15488398
]
Hudson commented on TIKA-2064:
------------------------------
SUCCESS: Integrated in Jenkins build tika-2.x #141 (See
[https://builds.apache.org/job/tika-2.x/141/])
TIKA-2064 Mime types, with magic, for mostly-xml Stata DTA files. (nick: rev
443a21e3fb564df9bb1c52f6533bd5da6f5cfcc8)
* (edit) tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
TIKA-2064 Test Stata DTA files from Michael Stepner, plus detection unit (nick:
rev e58ade381a3e4285eb81d55fb250611e82adbef7)
* (add) tika-parsers/src/test/resources/test-documents/testStataDTA.txt
* (add) tika-parsers/src/test/resources/test-documents/testStataDTA.dta
* (edit) tika-app/src/test/java/org/apache/tika/mime/TestMimeTypes.java
Merge changes for TIKA-2064 to 2.x (nick: rev
9f6241161af93c9cefd4ba90342b6834a49dc4b1)
* (add) tika-test-resources/src/test/resources/test-documents/testStataDTA.dta
* (delete) tika-parsers/src/test/resources/test-documents/testStataDTA.txt
* (add) tika-test-resources/src/test/resources/test-documents/testStataDTA.txt
* (delete) tika-parsers/src/test/resources/test-documents/testStataDTA.dta
> Document type detected incorrectly for Stata datasets (.dta extension)
> ----------------------------------------------------------------------
>
> Key: TIKA-2064
> URL: https://issues.apache.org/jira/browse/TIKA-2064
> Project: Tika
> Issue Type: Bug
> Components: detector
> Affects Versions: 1.13
> Reporter: Michael Stepner
> Attachments: stata_test_data.dta
>
>
> The content type of Stata datasets (created using http://www.stata.com
> software) is incorrectly detected as `text/html` by Tika. I have tested this
> using the latest release of Tika, v1.13:
> ```
> $ curl -O http://www.stata-press.com/data/r14/auto.dta
> $ java -jar tika-app-1.13.jar --detect auto.dta
> text/html
> ```
> I believe that the type should instead be `application/octet-stream` (or the
> equivalent).
> I originally reported this bug downstream (at
> https://github.com/laurilehmijoki/s3_website/issues/232), and was advised to
> report upstream to Tika. In addition to the one I downloaded using `curl` in
> my example, a variety of reference Stata datasets are posted here:
> http://www.stata-press.com/data/r14/dmain.html
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)