stevedlawrence opened a new pull request #472: URL: https://github.com/apache/incubator-daffodil/pull/472
The BucketingInputSource caches small-ish buckets of data in a "buckets" array while parsing. As we determine that Daffodil will no longer need data from older buckets (e.g. when PoU's are resolved), we set the array indices to null so that Java can garbage collect those buckets. This means that although the size of the buckets array may grow large for very large data, the majority of the elements in that array are null, so the actual cached data in memory is quite small. Sometimes we cannot discard buckets due to unresolved PoU's, so we impose a maximum number of buckets that can be cached. Once we reach this number of buckets in the buckets array, we simply discard the oldest bucket by setting it to null like above. If anything ever tries to use data from a discarded bucket we throw a backtracking error. In practice, the maximum bucket limit is equivalent to about 256MB of cached data, so is high enough that nothing reasonably needs to backtrack that far. However, we incorrectly calculate when to throw away the oldest bucket. We currently do so when the buckets array grows beyond some maximum number of elements. But this just means we've parsed some large amount of data, not that we have actually cached a large amount of data--this doesn't take into account the fact that many of the buckets in the array may have been discarded. This can lead to a situation where we think we have reached some a maximum cache size, but we really haven't, and so we start throwing away buckets that we actually need and could reasonably backtrack to. And if we do backtrack that small amount, we get an error. So instead of just throwing away the oldest bucket once the buckets array grows to some size, we really only want do so when the number of *non-null buckets* grows to some size, which is what actually implies the cached data has grown too large. This patch fixes the calculation to do that. DAFFODIL-2455 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
