Reduce memory footprint on parsing MSG attachments

Hölzl , Dominik Fri, 17 May 2019 03:25:25 -0700

Hello!

I have some suggestions to reduce memory footprint when parsing MSG files with 
huge/many attachments.


Currently AttachmentChunks uses ByteChunk for the attachment content data.
When parsing a MSG file (MAPIMessage ctor -> POIFSChunkParser.parse) this 
causes the complete attachment data to be read into memory as ByteChunk just 
reads the content into a plain byte array in 
ByteChunk.readValue/POIFSChunkParser.process.

My suggestion: Replace this with a newly introduced "ByteStreamChunk" which 
does not read the data initially on parsing but only refers the underlying 
InputStream which gives the possibility to read "directly" from the base input 
stream later.

This change would be a breaking change as with this the underlying stream 
(DocumentInputStream / POIFSFileSystem / ...) must not be closed prior to 
reading the attachment content.

Regards,
Dominik

<<attachment: HSMF_ByteStreamChunk.zip>>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reduce memory footprint on parsing MSG attachments

Reply via email to