[GitHub] [incubator-daffodil] stevedlawrence commented on pull request #472: Fix how we throw away buckets during parsing

GitBox Thu, 14 Jan 2021 10:15:34 -0800


stevedlawrence commented on pull request #472:
URL: 
https://github.com/apache/incubator-daffodil/pull/472#issuecomment-760374203



   I have done that, and done so with a multi gig file. So that stuff does 
work. The problem was the schema I used for that just too simple and required 
zero backtracking. So we were throwing away buckets earlier than we should, but 
it didnt matter since we never used those buckets. This particluar CSV schema, 
while it doesn't require much backtracking, does require a little bit of 
lookahead when scanning for delimiters. I think what happend in this case is we 
scanned for a delimiter, were overzealous in getting rid of the previous bucket 
in doing so, and then when the scanner came back to read the data, the bucket 
containing that data was gone.
   
   What we probably really need is a test that consumes a bunch of data, but 
has some a speculative parse that backtracks just a little less than 256MB. 
That huge backtrack should be allowed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [incubator-daffodil] stevedlawrence commented on pull request #472: Fix how we throw away buckets during parsing

Reply via email to