[ 
https://issues.apache.org/jira/browse/TIKA-2443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138374#comment-16138374
 ] 

Nick Burch commented on TIKA-2443:
----------------------------------

Tika doesn't care where you put the file, as long as the classloader can find 
it as {{org/apache/tika/parser/external/tika-external-parsers.xml}} . As long 
as your config folder is on the classpath, pop the file as 
{{tika-external-parsers.xml}} under a folder 
{{org/apache/tika/parser/external}} in there and it should be loaded

> Plain text file identified as rfc822 and which can cause StackOverflowError
> ---------------------------------------------------------------------------
>
>                 Key: TIKA-2443
>                 URL: https://issues.apache.org/jira/browse/TIKA-2443
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>    Affects Versions: 1.11, 1.16
>            Reporter: Viorica Visan
>
> I have a file called test.txt, containing only:
> Date:         06/25/2014 15:54:19
> And some more text I am writing. This will
> be detected as rfc822
> This file is detected and parsed as message/rfc822. 
> I think the magic rule on "Date: " is too strong and it should have detected 
> only as plain/text file. It looks to me like the reverse of  
> https://issues.apache.org/jira/browse/TIKA-879 
> We noticed this issue, because we have a large log file, which has many lines 
> with Date, Log level and Message which is parsed as message/rfc822 (only 
> because it starts with "Date:") and which throws 
> StackOverflowError in the end. 
> Is there some workaround to make this rule weaker ? through configuration ? 
> We use DefaultParser and everything default. We use tika in 1.11 version, but 
> we tried also  with tika 1.16 and we saw the same StackOverflowError (which 
> probably again happened because it was parsed as a rc822 type).
> The only workaround that I found was to add 
> custom-mimetypes.xml like this
>  <mime-type type="text/plain">
>     <magic priority="70">
>       <match value="Date:" type="string" offset="0"/>
>     </magic>
>   </mime-type>
> Would you recomend some other workaround to make sure the file does not get 
> parsed as rfc822 ? 
> And I have another question: can this custom-mimetypes.xml be specified from 
> an external location? 
> Many thanks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to