[
https://issues.apache.org/jira/browse/TIKA-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15960943#comment-15960943
]
Wing-Hong Andrew Ko edited comment on TIKA-2044 at 4/7/17 3:13 PM:
-------------------------------------------------------------------
Hello Luis!
Thanks for the explanation! Is there an easy way for one to attach a different
EmbeddedDocumentExtractor for mbox files vs pst files, or am I supposed to
register a single EmbeddedDocumentExtractor instance and do branching logic
internally in the parseEmbedded method based on e.g.
metadata.get(Metadata.CONTENT_TYPE)?
Submitted the [pull request|https://github.com/apache/tika/pull/166
] with a refactor and unit tests.
Cheers,
Andrew
was (Author: wko27):
Hello Luis!
Thanks for the explanation! Is there an easy way for one to attach a different
EmbeddedDocumentExtractor for mbox files vs pst files, or am I supposed to
register a single EmbeddedDocumentExtractor instance and do branching logic
internally in the parseEmbedded method based on e.g.
metadata.get(Metadata.CONTENT_TYPE)?
Submitted the [PR|https://github.com/apache/tika/pull/166
] with a refactor and unit tests.
Cheers,
Andrew
> MboxParser wrongly concatenates multiple text lines into single header line
> ---------------------------------------------------------------------------
>
> Key: TIKA-2044
> URL: https://issues.apache.org/jira/browse/TIKA-2044
> Project: Tika
> Issue Type: Bug
> Affects Versions: 1.13
> Environment: Tika 1.13, and 1.14 nightly build at the time of this
> writing
> Reporter: Vjeran Marcinko
>
> MboxParser combines multiple text lines into single header value by
> (suposedly) using LIFO structure (stack - java deque), but instead it uses
> FIFO (queue) to fetch last inserted line and to extend it with current line
> in incorrect way:
> Current code:
> Queue<String> multiline = new LinkedList<String>();
> ... few lines below...
> String latestLine = multiline.poll();
> Whereas it should be:
> Deque<String> multiline = new LinkedList<String>();
> ... few lines below...
> String latestLine = multiline.pollLast();
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)