[jira] [Commented] (DAFFODIL-2194) buffered data output stream has a chunk limit of 2GB

Michael Beckerle (Jira) Wed, 28 Aug 2019 09:50:35 -0700


    [ 
https://issues.apache.org/jira/browse/DAFFODIL-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917916#comment-16917916
 ]


Michael Beckerle commented on DAFFODIL-2194:
--------------------------------------------

I don't think fixing this by just auto-splitting 
DirectOrBufferedDataOutputStream if buffering and the buffer grows near 2GByte 
max is a good fix.

It will work, and allow larger blobs to be processed, but doesn't keep the blob 
data out of the java heap. We will be able to get past the 2GByte object size 
limit, but the java heap size limit will be next.

We want to be able to parse image/video files that are much larger than the 
java heap would want to grow to. I.e., a 64GByte NITF file parsed in a 5GByte 
JVM.

That means that the DirectOrBufferedDataOutputStream would need a specific 
blob-indirect flavor. So when you unparse a blob, rather than stream it into a 
buffered instance, copying all the bytes into one or more java heap objects of 
size <= 2GB, you instead just defer the blob and point at the file (and add in 
its length to the accumulated length for use by the dfdl:valueLength function). 
When the streams collapse together that's when the blob file contents would be 
accessed and streamed to the direct output stream.

 

> buffered data output stream has a chunk limit of 2GB
> ----------------------------------------------------
>
>                 Key: DAFFODIL-2194
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-2194
>             Project: Daffodil
>          Issue Type: Bug
>          Components: Back End
>            Reporter: Steve Lawrence
>            Assignee: Steve Lawrence
>            Priority: Major
>             Fix For: 2.5.0
>
>
> A buffered data outupt stream is backed by a growable ByteArrayOutputStream, 
> which can only grow to 2GB in size. So if we ever try to write more than 2GB 
> to a buffered output stream during unparse (very possible with large blobs), 
> we'll get an OutOfMemoryError.
> One potential solution is to be aware of the size of a ByteArrayOutputStream 
> when buffering output and automatically create a split when it gets to 2GB in 
> sizes. This will still require a ton of memory since we're buffering these in 
> memoary, but we'll at least be able to unparse more than 2GB of continuous 
> data. 
> Note that we should still be able to unparse more than 2GB of data total, as 
> long as there so single buffer that's more than 2GB.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (DAFFODIL-2194) buffered data output stream has a chunk limit of 2GB

Reply via email to