[ 
https://issues.apache.org/jira/browse/DAFFODIL-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Lawrence reassigned DAFFODIL-3065:
----------------------------------------

    Assignee: Steve Lawrence

> Unparse support for infoset prefetching--reduce suspensions
> -----------------------------------------------------------
>
>                 Key: DAFFODIL-3065
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-3065
>             Project: Daffodil
>          Issue Type: Improvement
>          Components: Performance, Unparsing
>            Reporter: Steve Lawrence
>            Assignee: Steve Lawrence
>            Priority: Major
>             Fix For: 4.2.0
>
>
> The current unparse is written with streaming in mind, but it does so pretty 
> conservatively, which can lead to potential performance slow downs. 
> At a high level, the way unparsing currentyl works is an individual unparser 
> asks for the next unparse event. If that event does not exist, we get the 
> next event from the infoset inputter, update the internal infoset 
> accordingly, and then return the associated event. This effectively reads 
> only one event at a time.
> This approach minimizes memory usage since we only store a single event and 
> the infoset is no larger than what the unparser is currently working on. The 
> downside to this approach is it is not uncommon for an unparser to reference 
> a future part of the infoset that does not exist yet, such as accessing the 
> value of an element or asking how many children are in an array. When we need 
> to query part of the infoset that does not exist yet, we clone the UState and 
> create a Suspension that is evaluated later once the required infoset 
> element(s) exist. This clone and Suspension can create additional overhead.
> Instead of creating the infoset one element at a time when an unparser needs 
> it, we should update the inputter logic so it can read a large number of 
> infoset events and build a larger section of the infoset, withing some 
> tunable limit. Essentially allowing something like infoset prefetching. This 
> tunable limit could default to a reasonably large value such that the infoset 
> could be entirely built prior to any unparsing for small infosets--this could 
> eliminate a large number of suspensions and speed up unparsing.
> Note that part of the current logic allocates InfosetAccessor events and adds 
> them to a queue for each event. This queue is only 2 items large so takes 
> very little memory. But if we did not change anything else, we would likely 
> need to increase this queue size so that it could store all the events for 
> all parts of the currently built infoset. However, this could require a 
> significant amount of memory.
> So an additional change to avoid this memory usage could be similar to how 
> the parse InfosetWalker works. Instead of maintaining a large queue of 
> events, we could maintain a single bit of state containing a pointer to the 
> current infoset element and what the current event is (i.e. 
> startElement/endElement/startArray/endArray). The various advance/inspect 
> functions would then read this current state or update the state to the next 
> element/event. So instead of allocating and maintaining a queue of accessor 
> events, we just query what the unparse infoset walker currently looks at. 
> Note that we can still maintain the Cursor API, it would just iterate over 
> and query the actual infoset instead of storing a queue of 
> events--essentially the infoset becomes its queue.
> With these changes, we can potentially avoid a large number of suspensions 
> without a significant change in memory usage, aside from the additionally 
> preffectch infoset.
> Note that this approach does not remove suspensions entirely. For example, 
> some unparsers need to know the content length of a field. This requires the 
> relevant element to actually be unparsed, not just the element to exist in 
> the infoset, so this would still need a suspension. This also will not 
> eliminate suspensions if an unparser accesses a field beyond the prefetch 
> limit. So all the suspension logic must still exist as is, the goal is to 
> simply minimize how often they are needed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to