Steve Lawrence created DAFFODIL-3065:
----------------------------------------

             Summary: Unparse support for infoset prefetching--reduce 
suspensions
                 Key: DAFFODIL-3065
                 URL: https://issues.apache.org/jira/browse/DAFFODIL-3065
             Project: Daffodil
          Issue Type: Improvement
          Components: Performance, Unparsing
            Reporter: Steve Lawrence
             Fix For: 4.2.0


The current unparse is written with streaming in mind, but it does so pretty 
conservatively, which can lead to potential performance slow downs. 

At a high level, the way unparsing currentyl works is an individual unparser 
asks for the next unparse event. If that event does not exist, we get the next 
event from the infoset inputter, update the internal infoset accordingly, and 
then return the associated event. This effectively reads only one event at a 
time.

This approach minimizes memory usage since we only store a single event and the 
infoset is no larger than what the unparser is currently working on. The 
downside to this approach is it is not uncommon for an unparser to reference a 
future part of the infoset that does not exist yet, such as accessing the value 
of an element or asking how many children are in an array. When we need to 
query part of the infoset that does not exist yet, we clone the UState and 
create a Suspension that is evaluated later once the required infoset 
element(s) exist. This clone and Suspension can create additional overhead.

Instead of creating the infoset one element at a time when an unparser needs 
it, we should update the inputter logic so it can read a large number of 
infoset events and build a larger section of the infoset, withing some tunable 
limit. Essentially allowing something like infoset prefetching. This tunable 
limit could default to a reasonably large value such that the infoset could be 
entirely built prior to any unparsing for small infosets--this could eliminate 
a large number of suspensions and speed up unparsing.

Note that part of the current logic allocates InfosetAccessor events and adds 
them to a queue for each event. This queue is only 2 items large so takes very 
little memory. But if we did not change anything else, we would likely need to 
increase this queue size so that it could store all the events for all parts of 
the currently built infoset. However, this could require a significant amount 
of memory.

So an additional change to avoid this memory usage could be similar to how the 
parse InfosetWalker works. Instead of maintaining a large queue of events, we 
could maintain a single bit of state containing a pointer to the current 
infoset element and what the current event is (i.e. 
startElement/endElement/startArray/endArray). The various advance/inspect 
functions would then read this current state or update the state to the next 
element/event. So instead of allocating and maintaining a queue of accessor 
events, we just query what the unparse infoset walker currently looks at. Note 
that we can still maintain the Cursor API, it would just iterate over and query 
the actual infoset instead of storing a queue of events--essentially the 
infoset becomes its queue.

With these changes, we can potentially avoid a large number of suspensions 
without a significant change in memory usage, aside from the additionally 
preffectch infoset.

Note that this approach does not remove suspensions entirely. For example, some 
unparsers need to know the content length of a field. This requires the 
relevant element to actually be unparsed, not just the element to exist in the 
infoset, so this would still need a suspension. This also will not eliminate 
suspensions if an unparser accesses a field beyond the prefetch limit. So all 
the suspension logic must still exist as is, the goal is to simply minimize how 
often they are needed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to