[
https://issues.apache.org/jira/browse/TIKA-2443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138374#comment-16138374
]
Nick Burch commented on TIKA-2443:
----------------------------------
Tika doesn't care where you put the file, as long as the classloader can find
it as {{org/apache/tika/parser/external/tika-external-parsers.xml}} . As long
as your config folder is on the classpath, pop the file as
{{tika-external-parsers.xml}} under a folder
{{org/apache/tika/parser/external}} in there and it should be loaded
> Plain text file identified as rfc822 and which can cause StackOverflowError
> ---------------------------------------------------------------------------
>
> Key: TIKA-2443
> URL: https://issues.apache.org/jira/browse/TIKA-2443
> Project: Tika
> Issue Type: Bug
> Components: mime
> Affects Versions: 1.11, 1.16
> Reporter: Viorica Visan
>
> I have a file called test.txt, containing only:
> Date: 06/25/2014 15:54:19
> And some more text I am writing. This will
> be detected as rfc822
> This file is detected and parsed as message/rfc822.
> I think the magic rule on "Date: " is too strong and it should have detected
> only as plain/text file. It looks to me like the reverse of
> https://issues.apache.org/jira/browse/TIKA-879
> We noticed this issue, because we have a large log file, which has many lines
> with Date, Log level and Message which is parsed as message/rfc822 (only
> because it starts with "Date:") and which throws
> StackOverflowError in the end.
> Is there some workaround to make this rule weaker ? through configuration ?
> We use DefaultParser and everything default. We use tika in 1.11 version, but
> we tried also with tika 1.16 and we saw the same StackOverflowError (which
> probably again happened because it was parsed as a rc822 type).
> The only workaround that I found was to add
> custom-mimetypes.xml like this
> <mime-type type="text/plain">
> <magic priority="70">
> <match value="Date:" type="string" offset="0"/>
> </magic>
> </mime-type>
> Would you recomend some other workaround to make sure the file does not get
> parsed as rfc822 ?
> And I have another question: can this custom-mimetypes.xml be specified from
> an external location?
> Many thanks.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)