[ 
https://issues.apache.org/jira/browse/TIKA-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15961854#comment-15961854
 ] 

Luis Filipe Nassif commented on TIKA-2044:
------------------------------------------

Hi Andrew,

You could do mimeType detection before parsing (using Detector or 
Tika.detect()), so you can pass into parseContext different 
EmbeddedDocumentExtractor depending on the detected mimeType (pst or mbox).

I am a bit busy these days. [~thaichat04] could you take a look at the PR, as 
it changes code you've added? 

> MboxParser wrongly concatenates multiple text lines into single header line
> ---------------------------------------------------------------------------
>
>                 Key: TIKA-2044
>                 URL: https://issues.apache.org/jira/browse/TIKA-2044
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.13
>         Environment: Tika 1.13, and 1.14 nightly build at the time of this 
> writing
>            Reporter: Vjeran Marcinko
>
> MboxParser combines multiple text lines into single header value by 
> (suposedly) using LIFO structure (stack - java deque), but instead it uses 
> FIFO (queue) to fetch last inserted line and to extend it with current line 
> in incorrect way:
> Current code:
> Queue<String> multiline = new LinkedList<String>();
> ... few lines below...
> String latestLine = multiline.poll();
> Whereas it should be:
> Deque<String> multiline = new LinkedList<String>();
> ... few lines below...
> String latestLine = multiline.pollLast();



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to