[ 
https://issues.apache.org/jira/browse/TIKA-2443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138378#comment-16138378
 ] 

Luis Filipe Nassif edited comment on TIKA-2443 at 8/23/17 2:00 PM:
-------------------------------------------------------------------

Currently no in an arbitrary folder, but I have used Nick's approach in the 
past successfully.

How are you using Tika? If you are using the Detector interface, I can send you 
a custom implementation to read the custom-mimetype.xml path from a system 
property, for example.


was (Author: lfcnassif):
Currently no. How are you using Tika? If you are using the Detector interface, 
I can send you a custom implementation to read the custom-mimetype.xml from a 
system property, for example.

> Plain text file identified as rfc822 and which can cause StackOverflowError
> ---------------------------------------------------------------------------
>
>                 Key: TIKA-2443
>                 URL: https://issues.apache.org/jira/browse/TIKA-2443
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>    Affects Versions: 1.11, 1.16
>            Reporter: Viorica Visan
>
> I have a file called test.txt, containing only:
> Date:         06/25/2014 15:54:19
> And some more text I am writing. This will
> be detected as rfc822
> This file is detected and parsed as message/rfc822. 
> I think the magic rule on "Date: " is too strong and it should have detected 
> only as plain/text file. It looks to me like the reverse of  
> https://issues.apache.org/jira/browse/TIKA-879 
> We noticed this issue, because we have a large log file, which has many lines 
> with Date, Log level and Message which is parsed as message/rfc822 (only 
> because it starts with "Date:") and which throws 
> StackOverflowError in the end. 
> Is there some workaround to make this rule weaker ? through configuration ? 
> We use DefaultParser and everything default. We use tika in 1.11 version, but 
> we tried also  with tika 1.16 and we saw the same StackOverflowError (which 
> probably again happened because it was parsed as a rc822 type).
> The only workaround that I found was to add 
> custom-mimetypes.xml like this
>  <mime-type type="text/plain">
>     <magic priority="70">
>       <match value="Date:" type="string" offset="0"/>
>     </magic>
>   </mime-type>
> Would you recomend some other workaround to make sure the file does not get 
> parsed as rfc822 ? 
> And I have another question: can this custom-mimetypes.xml be specified from 
> an external location? 
> Many thanks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to