https://issues.apache.org/bugzilla/show_bug.cgi?id=51873

             Bug #: 51873
           Summary: [BUG] Invalid chunk name Olk10SideProps_0001 (Parsing
                    MSG files - Outlook 2002 drag and dropped)
           Product: POI
           Version: 3.8-dev
          Platform: PC
            Status: NEW
          Severity: major
          Priority: P2
         Component: HSMF
        AssignedTo: [email protected]
        ReportedBy: [email protected]
    Classification: Unclassified


I'm getting this error on a bunch of Outlook Msg files I'm trying to ingest. 
Due to the sensitive nature of the task, I can't post an example here, though I
may be able to try and recreate one in the next few days and attach it.

After some research it appears that the Olk10SideProps_0001 stream was only
written out by Outlook 2002 for documents dragged and dropped to disk.  This
stream may contain message ID and store ID.  It is an undocumented stream in
the
MS-OXMSG.  See further explanation here:
http://social.msdn.microsoft.com/Forums/en-US/os_exchangeprotocols/thread/1f2848a4-3b6a-4f8f-85dd-55e6b12fdec6


If possible, adding a fix that will ignore this stream and continue processing
the MSG file, if it can be done so in a valid method.  I'll see if I can get
anything to work on my end.

Stack Trace from Tika(1.0-SNAPSHOT) called Poi-3.8-b4:

Caused by: java.lang.IllegalArgumentException: Invalid chunk name
Olk10SideProps_0001
    at
org.apache.poi.hsmf.parsers.POIFSChunkParser.process(POIFSChunkParser.java:125)
    at
org.apache.poi.hsmf.parsers.POIFSChunkParser.processChunks(POIFSChunkParser.java:98)
    at
org.apache.poi.hsmf.parsers.POIFSChunkParser.parse(POIFSChunkParser.java:85)
    at org.apache.poi.hsmf.MAPIMessage.<init>(MAPIMessage.java:127)
    at
org.apache.tika.parser.microsoft.OutlookExtractor.<init>(OutlookExtractor.java:57)
    at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:217)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to