[jira] [Commented] (DAFFODIL-2194) buffered data output stream has a chunk limit of 2GB

Steve Lawrence (Jira) Wed, 28 Aug 2019 11:44:09 -0700


    [ 
https://issues.apache.org/jira/browse/DAFFODIL-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16918005#comment-16918005
 ]


Steve Lawrence commented on DAFFODIL-2194:
------------------------------------------

I like it.

It sitll might be a good idea to limit the size of the cache. That would allow 
small systems without much heap space to parse relatively large files, as long 
as they never back track too far in practice.

> buffered data output stream has a chunk limit of 2GB
> ----------------------------------------------------
>
>                 Key: DAFFODIL-2194
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-2194
>             Project: Daffodil
>          Issue Type: Bug
>          Components: Back End
>            Reporter: Steve Lawrence
>            Assignee: Steve Lawrence
>            Priority: Major
>             Fix For: 2.5.0
>
>
> A buffered data outupt stream is backed by a growable ByteArrayOutputStream, 
> which can only grow to 2GB in size. So if we ever try to write more than 2GB 
> to a buffered output stream during unparse (very possible with large blobs), 
> we'll get an OutOfMemoryError.
> One potential solution is to be aware of the size of a ByteArrayOutputStream 
> when buffering output and automatically create a split when it gets to 2GB in 
> sizes. This will still require a ton of memory since we're buffering these in 
> memoary, but we'll at least be able to unparse more than 2GB of continuous 
> data. 
> Note that we should still be able to unparse more than 2GB of data total, as 
> long as there so single buffer that's more than 2GB.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (DAFFODIL-2194) buffered data output stream has a chunk limit of 2GB

Reply via email to