[ 
https://issues.apache.org/jira/browse/DAFFODIL-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050634#comment-18050634
 ] 

Steve Lawrence commented on DAFFODIL-3065:
------------------------------------------

I dug into the InfosetInputter a bit, and a found two additional complications.

# Unparsers currently push various TermRuntimeData to a stack maintained by the 
InfosetInputter, and that stack is used for resolving next element ERDs. But 
with the above suggestion, we must be able to completely build the infoset 
(including resolving next element ERDs) without any information from unparsers. 
This means how we create next element resolves needs to change--they cannot 
rely on information from unparsers.
# The InfosetInputter does not currently receive any events related to hidden 
group ref elements, because those elements do not appear in the infoset. 
Instead, the unparses for those hidden elements are responsible for handling 
those elements. But if we change InfosetInputters to build the infosets, it 
meants they will not rebuild hidden elements. I don't it would be easy to allow 
unparsers to augment the infoset after parts of it have already been built.

So I think these two issues indicate the above suggested change, where the 
InfosetInpttuer becomes responsible for building the infoset means it must also 
be responsible for rebuilding the augmented infoset, including hidden group ref 
elements. One benefit of this is it would likely make it easier to implement 
default values when unparsing, since the logic for augmenting with OVCs from 
hidden group refs is probably very similar for augmenting it with default 
values.

This all indicates that a pretty significant change is required to how we 
implement next element resolvers and how we build the augmented infoset. We 
likely need something very similar to how XSD validators work since they too 
implement schema aware rebuilding of XSD validators.

> Unparse support for infoset prefetching--reduce suspensions
> -----------------------------------------------------------
>
>                 Key: DAFFODIL-3065
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-3065
>             Project: Daffodil
>          Issue Type: Improvement
>          Components: Performance, Unparsing
>            Reporter: Steve Lawrence
>            Priority: Major
>             Fix For: 4.2.0
>
>
> The current unparse is written with streaming in mind, but it does so pretty 
> conservatively, which can lead to potential performance slow downs. 
> At a high level, the way unparsing currentyl works is an individual unparser 
> asks for the next unparse event. If that event does not exist, we get the 
> next event from the infoset inputter, update the internal infoset 
> accordingly, and then return the associated event. This effectively reads 
> only one event at a time.
> This approach minimizes memory usage since we only store a single event and 
> the infoset is no larger than what the unparser is currently working on. The 
> downside to this approach is it is not uncommon for an unparser to reference 
> a future part of the infoset that does not exist yet, such as accessing the 
> value of an element or asking how many children are in an array. When we need 
> to query part of the infoset that does not exist yet, we clone the UState and 
> create a Suspension that is evaluated later once the required infoset 
> element(s) exist. This clone and Suspension can create additional overhead.
> Instead of creating the infoset one element at a time when an unparser needs 
> it, we should update the inputter logic so it can read a large number of 
> infoset events and build a larger section of the infoset, withing some 
> tunable limit. Essentially allowing something like infoset prefetching. This 
> tunable limit could default to a reasonably large value such that the infoset 
> could be entirely built prior to any unparsing for small infosets--this could 
> eliminate a large number of suspensions and speed up unparsing.
> Note that part of the current logic allocates InfosetAccessor events and adds 
> them to a queue for each event. This queue is only 2 items large so takes 
> very little memory. But if we did not change anything else, we would likely 
> need to increase this queue size so that it could store all the events for 
> all parts of the currently built infoset. However, this could require a 
> significant amount of memory.
> So an additional change to avoid this memory usage could be similar to how 
> the parse InfosetWalker works. Instead of maintaining a large queue of 
> events, we could maintain a single bit of state containing a pointer to the 
> current infoset element and what the current event is (i.e. 
> startElement/endElement/startArray/endArray). The various advance/inspect 
> functions would then read this current state or update the state to the next 
> element/event. So instead of allocating and maintaining a queue of accessor 
> events, we just query what the unparse infoset walker currently looks at. 
> Note that we can still maintain the Cursor API, it would just iterate over 
> and query the actual infoset instead of storing a queue of 
> events--essentially the infoset becomes its queue.
> With these changes, we can potentially avoid a large number of suspensions 
> without a significant change in memory usage, aside from the additionally 
> preffectch infoset.
> Note that this approach does not remove suspensions entirely. For example, 
> some unparsers need to know the content length of a field. This requires the 
> relevant element to actually be unparsed, not just the element to exist in 
> the infoset, so this would still need a suspension. This also will not 
> eliminate suspensions if an unparser accesses a field beyond the prefetch 
> limit. So all the suspension logic must still exist as is, the goal is to 
> simply minimize how often they are needed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to