[
https://issues.apache.org/jira/browse/TIKA-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14569079#comment-14569079
]
Tim Allison commented on TIKA-1644:
-----------------------------------
Doh! Y, I just finished the preliminary (buggy) switch in the eval code to one
row per "any file", including embedded files instead of the older one row per
input file. Thank you for catching this.
> Mime type diffs between 1.8 and 1.9-rc1
> ---------------------------------------
>
> Key: TIKA-1644
> URL: https://issues.apache.org/jira/browse/TIKA-1644
> Project: Tika
> Issue Type: Bug
> Reporter: Tim Allison
> Attachments: mime_diffs.xlsx
>
>
> When running 1.9-rc1 against govdocs1, I found a few files whose mime-types
> have changed. I'm posting this now so that others can look...some of these
> are for the better, and some not.
> For further investigation:
> * embedded pict and wmf are now sometimes identified as pdf (TIKA-1085)
> * several .doc files are now identified as application/x-msmetafile and no
> text is being extracted
> * several .doc files are now identified as jpeg or png and no text is being
> extracted
> * several .ppt files which were being identified as various (jpeg, ppt, png,
> msoffice, word) are now being detected as excel
> Probably for the good:
> * a handful of files that were identified as text are now identified as pdf
> (TIKA-1085)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)