[GitHub] [incubator-daffodil] stevedlawrence commented on pull request #472: Fix how we throw away buckets during parsing

GitBox Thu, 14 Jan 2021 08:38:05 -0800


stevedlawrence commented on pull request #472:
URL: 
https://github.com/apache/incubator-daffodil/pull/472#issuecomment-760313432



   > Is this fix relevant only to Daffodil 3.0.0, or is this something 
affecting code from earlier revisions such as 2.4.0 or 2.6.0 ?
   
   This bug was introduced in 2.5.0, so it's been around for about a year. But 
without the new streaming capabilities introduced in 3.0.0, I'm not sure how 
likely it is to actually hit this issue. I suspect of the time you'll just run 
out of memory before hit this bug. I just now tested this same schema + data 
with 2.7.0 and if looks like it's just stuck in the garbage collector trying to 
find memory to free. I suspect if I let it run long enough I'd get an 
OutOfMemoryException.
   
   It might be worth considering a 3.0.1 patch release for this issue, since 
there isn't really a good workaround. The only workaround when dealing with 
files larger than 256MB I can think of is to avoid the InputStream constructor 
when creating an InputSourceDataInputStream so it just doesn't use the 
bucketing stuff at all. Instead a user can use the ByteBuffer or Array[Byte] 
cosntructors, but that means that all the data must be read into memory, and 
there's a 2GB limit, so it's not that great of a workaround. And that only 
works for people using the API. There's no workaround for people using the CLI.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [incubator-daffodil] stevedlawrence commented on pull request #472: Fix how we throw away buckets during parsing

Reply via email to