stevedlawrence opened a new pull request #472:
URL: https://github.com/apache/incubator-daffodil/pull/472


   The BucketingInputSource caches small-ish buckets of data in a "buckets"
   array while parsing. As we determine that Daffodil will no longer need
   data from older buckets (e.g. when PoU's are resolved), we set the array
   indices to null so that Java can garbage collect those buckets. This
   means that although the size of the buckets array may grow large for
   very large data, the majority of the elements in that array are null, so
   the actual cached data in memory is quite small.
   
   Sometimes we cannot discard buckets due to unresolved PoU's, so we
   impose a maximum number of buckets that can be cached. Once we reach
   this number of buckets in the buckets array, we simply discard the
   oldest bucket by setting it to null like above. If anything ever tries
   to use data from a discarded bucket we throw a backtracking error. In
   practice, the maximum bucket limit is equivalent to about 256MB of
   cached data, so is high enough that nothing reasonably needs to
   backtrack that far.
   
   However, we incorrectly calculate when to throw away the oldest bucket.
   We currently do so when the buckets array grows beyond some maximum
   number of elements. But this just means we've parsed some large amount
   of data, not that we have actually cached a large amount of data--this
   doesn't take into account the fact that many of the buckets in the array
   may have been discarded. This can lead to a situation where we think we
   have reached some a maximum cache size, but we really haven't, and so we
   start throwing away buckets that we actually need and could reasonably
   backtrack to. And if we do backtrack that small amount, we get an error.
   
   So instead of just throwing away the oldest bucket once the buckets
   array grows to some size, we really only want do so when the number of
   *non-null buckets* grows to some size, which is what actually implies
   the cached data has grown too large. This patch fixes the calculation to
   do that.
   
   DAFFODIL-2455


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to