stevedlawrence opened a new pull request, #1097:
URL: https://github.com/apache/daffodil/pull/1097

   The VariableMap currently uses a Scala Map to look up variable instances, 
where the key is the QName of the variable and the value is the variable 
instance stack.
   
   One consideration with the data structure to be used is that it must be 
frequently copied each time a suspension is created. So although fast lookups 
are important, so are fast copies. And unfortunately, Map have relatively 
expensive copies. A large backing array must be created, buckets allocated, 
keys must be rehashed, etc. This overhead is enough that it can be a 
significant contribution when profiling, especially in formats with lots of 
suspensions.
   
   To fix this, this replaces the Map with an Array, and each 
VariableRuntimeData is assigned an index into this array. Now, instead of 
looking up a variable by its QName, we just get its index from the VRD and 
access that index in the array. This has the same constant time lookup, but 
array copies are faster and require fewer and smaller allocations.
   
   Similarly, during suspensions we must also create an immutable copy of each 
variable instance inside the array so that suspensions do not see new variables 
instances. Variable instances are currently represented as an ArrayBuffer, 
which has noticeable overhead when copying. Since this ArrayBuffer is just used 
as a stack, this replaces it with a Seq. This gives us similar stack-like 
behavior, but is immutable so copies are free. This does mean however that new 
Seq's must be allocated if we create a lot of newVariableInstances, but that is 
rare enough, and still fairly fast, that the benefit of preallocating an 
ArrayBuffer stack does not have significant benefits.
   
   Note that the Seq of VariableRuntimeData is added as a member to the 
VariableMap since in some cases (e.g. debugging, setting external variables), 
we do need a way to lookup the VRD by QName. This will be relatively slow, so 
any performance critical sections should find the VRD during compilation and 
use that during runtime.
   
   This also removes the VariableMapFactory, it doesn't add much value except a 
level of indirection.
   
   In a large schema with small files and lots of suspensions, this saw around 
a 50% improvement in unparse performance.
   
   DAFFODIL-2852


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to