[jira] [Commented] (DAFFODIL-2194) buffered data output stream has a chunk limit of 2GB

Michael Beckerle (Jira) Wed, 28 Aug 2019 11:56:35 -0700


    [ 
https://issues.apache.org/jira/browse/DAFFODIL-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16918014#comment-16918014
 ]


Michael Beckerle commented on DAFFODIL-2194:
--------------------------------------------

Agree. I recall that the Microsoft BizTalk parser had a tuning parameter around 
amount of lookahead, which could be set even to zero which would prevent any 
PoUs.

Another way for the cache-limit to work is that if the cache grows sufficiently 
past a limit, it serves as a discriminator and all enclosing PoU are marked as 
discriminated (literally run up the stack of them, setting them all to true), 
at which point a bunch of buffers can be freed. A failure at that point would 
then be fatal as there would be no PoU to backtrack to.  But a parse could 
succeed so long as no backtracking was actually needed.

One challenge with the no-PoU-with-BLOB thing is that it has bad composition 
properties. Modifying a schema to replace a hexBinary with a BLOB might cause 
the need to rethink the whole way the schema works to avoid the enclosing PoUs.

> buffered data output stream has a chunk limit of 2GB
> ----------------------------------------------------
>
>                 Key: DAFFODIL-2194
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-2194
>             Project: Daffodil
>          Issue Type: Bug
>          Components: Back End
>            Reporter: Steve Lawrence
>            Assignee: Steve Lawrence
>            Priority: Major
>             Fix For: 2.5.0
>
>
> A buffered data outupt stream is backed by a growable ByteArrayOutputStream, 
> which can only grow to 2GB in size. So if we ever try to write more than 2GB 
> to a buffered output stream during unparse (very possible with large blobs), 
> we'll get an OutOfMemoryError.
> One potential solution is to be aware of the size of a ByteArrayOutputStream 
> when buffering output and automatically create a split when it gets to 2GB in 
> sizes. This will still require a ton of memory since we're buffering these in 
> memoary, but we'll at least be able to unparse more than 2GB of continuous 
> data. 
> Note that we should still be able to unparse more than 2GB of data total, as 
> long as there so single buffer that's more than 2GB.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (DAFFODIL-2194) buffered data output stream has a chunk limit of 2GB

Reply via email to