[
https://issues.apache.org/jira/browse/TIKA-2443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138378#comment-16138378
]
Luis Filipe Nassif edited comment on TIKA-2443 at 8/23/17 2:00 PM:
-------------------------------------------------------------------
Currently no in an arbitrary folder, but I have used Nick's approach in the
past successfully.
How are you using Tika? If you are using the Detector interface, I can send you
a custom implementation to read the custom-mimetype.xml path from a system
property, for example.
was (Author: lfcnassif):
Currently no. How are you using Tika? If you are using the Detector interface,
I can send you a custom implementation to read the custom-mimetype.xml from a
system property, for example.
> Plain text file identified as rfc822 and which can cause StackOverflowError
> ---------------------------------------------------------------------------
>
> Key: TIKA-2443
> URL: https://issues.apache.org/jira/browse/TIKA-2443
> Project: Tika
> Issue Type: Bug
> Components: mime
> Affects Versions: 1.11, 1.16
> Reporter: Viorica Visan
>
> I have a file called test.txt, containing only:
> Date: 06/25/2014 15:54:19
> And some more text I am writing. This will
> be detected as rfc822
> This file is detected and parsed as message/rfc822.
> I think the magic rule on "Date: " is too strong and it should have detected
> only as plain/text file. It looks to me like the reverse of
> https://issues.apache.org/jira/browse/TIKA-879
> We noticed this issue, because we have a large log file, which has many lines
> with Date, Log level and Message which is parsed as message/rfc822 (only
> because it starts with "Date:") and which throws
> StackOverflowError in the end.
> Is there some workaround to make this rule weaker ? through configuration ?
> We use DefaultParser and everything default. We use tika in 1.11 version, but
> we tried also with tika 1.16 and we saw the same StackOverflowError (which
> probably again happened because it was parsed as a rc822 type).
> The only workaround that I found was to add
> custom-mimetypes.xml like this
> <mime-type type="text/plain">
> <magic priority="70">
> <match value="Date:" type="string" offset="0"/>
> </magic>
> </mime-type>
> Would you recomend some other workaround to make sure the file does not get
> parsed as rfc822 ?
> And I have another question: can this custom-mimetypes.xml be specified from
> an external location?
> Many thanks.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)