[
https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17923414#comment-17923414
]
Tim Allison commented on TIKA-4375:
-----------------------------------
I did notice several handfuls of documents that are no longer detected as
"comma" (273 files) or "tab" delimited (944) ...just plain text files. I took a
look at these two:
{{commoncrawl3/HZ/HZ57NY54I7QJIK5CB7U7TAEVAJRR6R2N}}
{{commoncrawl3/SA/SAUXGJWDMQ2YMUW4XC7O366UAPVTCYLL}}
The problem is that they have a header line without commas and then there's a
csv file.... This also happens in 3.1.0... I'm not sure what the best way to
handle this is. I don't think this is a showstopper.
> Regression tests for 2.9.3 release
> ----------------------------------
>
> Key: TIKA-4375
> URL: https://issues.apache.org/jira/browse/TIKA-4375
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Major
> Attachments: 43R5U3BXJUDJXDZ25OAE33ZU47362WLV.zip,
> LTWA2JGVJGJ5RVKHTUX6SDS4NTL5UJVQ-p139.pdf, RYT4H6OCPKZPFG3YK5PGLETS6Q3SBUDV,
> reports-tika-2.9.3-rc1.tgz, tika-2.9.2-v-tika-2.9.3-reports.tgz
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)