[
https://issues.apache.org/jira/browse/TIKA-2530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16392837#comment-16392837
]
Tomasz L edited comment on TIKA-2530 at 3/9/18 12:51 PM:
---------------------------------------------------------
[[email protected]] Ohh, you are right. This text file is executable. I
haven't noticed.
I discovered when you add "MZ"(https://en.wikipedia.org/wiki/DOS_MZ_executable)
as the beginning of file it becomes executable. You can check file content
https://issues.apache.org/jira/secure/attachment/12913621/12913621_test_file.txt
I think we can forget about my issue. Thx
was (Author: lenczykt):
[[email protected]] Ohh, you are right. This text file is executable. I
haven't noticed.
I discovered when you add "MZ" as the beginning of file it becomes executable.
You can check file content
https://issues.apache.org/jira/secure/attachment/12913621/12913621_test_file.txt
I think we can forget about my issue. Thx
> OutlookExtractor "buffer underrun" when parsing .msg with embedded .msg
> -----------------------------------------------------------------------
>
> Key: TIKA-2530
> URL: https://issues.apache.org/jira/browse/TIKA-2530
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.16, 1.17
> Environment: Reproduced with both Tika 1.16 and Tika 1.17 on Windows
> but the problem is likely on all platform.
> Reporter: Pascal Essiembre
> Assignee: Tim Allison
> Priority: Major
> Attachments: test_file.txt
>
>
> When parsing certain .msg files containing certain attachments (e.g. other
> .msg files), I get this error:
> {noformat}
> ...
> Caused by: org.apache.poi.util.LittleEndian$BufferUnderrunException: buffer
> underrun
> at org.apache.poi.util.LittleEndian.readInt(LittleEndian.java:662)
> at org.apache.poi.hmef.CompressedRTF.decompress(CompressedRTF.java:73)
> at
> org.apache.poi.util.LZWDecompresser.decompress(LZWDecompresser.java:81)
> at
> org.apache.poi.hmef.attribute.MAPIRtfAttribute.<init>(MAPIRtfAttribute.java:42)
> at
> org.apache.tika.parser.microsoft.OutlookExtractor.parse(OutlookExtractor.java:270)
> ...
> {noformat}
> I think the issue is with {{MAPIRtfAttribute}} not liking it when receiving
> an empty byte array from {{OutlookExtractor}}. I was able to eliminate the
> error at around line 269 of {{OutlookExtractor}} with Tika 1.16 code (or
> around line 322 with Tika 1.17) with the following:
> {code:java}
> //--- START FIX ---
> ByteChunk chunk = (ByteChunk) rtfChunk;
> if (chunk != null && chunk.getValue() != null
> && chunk.getValue().length > 0 && !doneBody) {
> //ByteChunk chunk = (ByteChunk) rtfChunk;
> //--- END FIX ---
> {code}
> I am not sure if that is a real fix or more should be done than just getting
> rid of the error to make sure all is extracted properly from all files.
> I cannot share the sample file I have to test since it was given to me as
> sensitive content and I could not recreate a faulty msg file.
> Thanks
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)