[
https://issues.apache.org/jira/browse/TIKA-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17830954#comment-17830954
]
Tilman Hausherr commented on TIKA-4218:
---------------------------------------
6FOMNUPGPA6IG66Z4NIUEQIVOR5ON46Q (an MP4 file) has a loss of metadata
(bierenbach: 2 | earlier: 2 | https://www.facebook.com/speedlinecablecam: 2 |
https://www.speedline-cablecam.com: 2 | in: 2 | of: 2 | the: 2 | this: 2 |
woods: 2 | year: 2)
EEXR753OKDGYAIXL36PZ2EGYPN477SZU and a few other files have one word in
TOP_10_MORE_IN_A which reappears in TOP_10_MORE_IN_B but with "oebps". Here,
"secretary" becomes "secretaryoebps". I don't know if this is a bug or not.
> Run regression tests to support 2.9.2 release
> ---------------------------------------------
>
> Key: TIKA-4218
> URL: https://issues.apache.org/jira/browse/TIKA-4218
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Major
> Attachments: 2.9.1-876503.pdf.json, 2.9.2-876503.pdf.json
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)