[
https://issues.apache.org/jira/browse/DAFFODIL-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Steve Lawrence reassigned DAFFODIL-3065:
----------------------------------------
Assignee: Steve Lawrence
> Unparse support for infoset prefetching--reduce suspensions
> -----------------------------------------------------------
>
> Key: DAFFODIL-3065
> URL: https://issues.apache.org/jira/browse/DAFFODIL-3065
> Project: Daffodil
> Issue Type: Improvement
> Components: Performance, Unparsing
> Reporter: Steve Lawrence
> Assignee: Steve Lawrence
> Priority: Major
> Fix For: 4.2.0
>
>
> The current unparse is written with streaming in mind, but it does so pretty
> conservatively, which can lead to potential performance slow downs.
> At a high level, the way unparsing currentyl works is an individual unparser
> asks for the next unparse event. If that event does not exist, we get the
> next event from the infoset inputter, update the internal infoset
> accordingly, and then return the associated event. This effectively reads
> only one event at a time.
> This approach minimizes memory usage since we only store a single event and
> the infoset is no larger than what the unparser is currently working on. The
> downside to this approach is it is not uncommon for an unparser to reference
> a future part of the infoset that does not exist yet, such as accessing the
> value of an element or asking how many children are in an array. When we need
> to query part of the infoset that does not exist yet, we clone the UState and
> create a Suspension that is evaluated later once the required infoset
> element(s) exist. This clone and Suspension can create additional overhead.
> Instead of creating the infoset one element at a time when an unparser needs
> it, we should update the inputter logic so it can read a large number of
> infoset events and build a larger section of the infoset, withing some
> tunable limit. Essentially allowing something like infoset prefetching. This
> tunable limit could default to a reasonably large value such that the infoset
> could be entirely built prior to any unparsing for small infosets--this could
> eliminate a large number of suspensions and speed up unparsing.
> Note that part of the current logic allocates InfosetAccessor events and adds
> them to a queue for each event. This queue is only 2 items large so takes
> very little memory. But if we did not change anything else, we would likely
> need to increase this queue size so that it could store all the events for
> all parts of the currently built infoset. However, this could require a
> significant amount of memory.
> So an additional change to avoid this memory usage could be similar to how
> the parse InfosetWalker works. Instead of maintaining a large queue of
> events, we could maintain a single bit of state containing a pointer to the
> current infoset element and what the current event is (i.e.
> startElement/endElement/startArray/endArray). The various advance/inspect
> functions would then read this current state or update the state to the next
> element/event. So instead of allocating and maintaining a queue of accessor
> events, we just query what the unparse infoset walker currently looks at.
> Note that we can still maintain the Cursor API, it would just iterate over
> and query the actual infoset instead of storing a queue of
> events--essentially the infoset becomes its queue.
> With these changes, we can potentially avoid a large number of suspensions
> without a significant change in memory usage, aside from the additionally
> preffectch infoset.
> Note that this approach does not remove suspensions entirely. For example,
> some unparsers need to know the content length of a field. This requires the
> relevant element to actually be unparsed, not just the element to exist in
> the infoset, so this would still need a suspension. This also will not
> eliminate suspensions if an unparser accesses a field beyond the prefetch
> limit. So all the suspension logic must still exist as is, the goal is to
> simply minimize how often they are needed.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)