stevedlawrence commented on code in PR #1676:
URL: https://github.com/apache/daffodil/pull/1676#discussion_r3325189099
##########
daffodil-core/src/main/scala/org/apache/daffodil/runtime1/processors/parsers/SequenceParserBases.scala:
##########
@@ -212,11 +212,6 @@ abstract class SequenceParserBase(
// should not increment the group index.
pstate.mpstate.moveOverOneGroupIndexOnly()
}
-
- // we might have added a new instance to the array. Attempt to
project it to an
- // infoset if there are no PoU's or anything blocking it
- pstate.walker.walk()
-
Review Comment:
I think we still want these `walker.walk()` calls, they should just be gated
on the value of the new tunable being streaming. That way if streaming mode is
enabled we still try walk the infoset in the middle of parsing to output
whatever parses of the infoset are marked as final.
##########
daffodil-propgen/src/main/resources/org/apache/daffodil/xsd/dafext.xsd:
##########
Review Comment:
We might prefix this with "If infosetWalkerMode is "streaming", Daffodil
periodically.." to make it clear this tunable only applies to streaming mode.
Same with infosetWalkerSkipMax
##########
daffodil-propgen/src/main/resources/org/apache/daffodil/xsd/dafext.xsd:
##########
@@ -239,6 +239,17 @@
</xs:restriction>
</xs:simpleType>
</xs:element>
+ <xs:element name="infosetWalkerMode"
type="daf:TunableInfosetWalkerMode" default="nonStreaming" minOccurs="0">
+ <xs:annotation>
+ <xs:documentation>
+ Daffodil can periodically walk the internal infoset to send
events to the configured
+ InfosetOutputter (streaming) or it can walk the internal infoset
once at the end of
+ parsing (nonStreaming). The idea being that simple schemas would
benefit from the
+ nonStreaming infoset walker, while more complex schemas with
lots of points of
+ uncertaintly would benefit from the streaming infoset walker.
Review Comment:
I think schemas with PoU's are actual also likely to benefit with
non-streaming mode, this is because PoU's tend to make it so the infoset walker
can't do any work because the walker can't walk into something if there's a PoU
where we *might* backtrack and create a different infoset. So streaming with
lots of PoUs is likely to just lead to to attempts to walk that don't do
anything.
I think the main situation where someone would want to use streaming mode
is when the infoset is likely to be very large or when memory is constrained.
Note that this is because the main benefit of streaming mode is that it
allows parts of the internal infoset that we know won't be changed to be sent
to the outputter and garbage collected in the middle of a parse, which frees up
memory while parsing. But if there is no real memory pressure, I imagine in
most cases non-streaming will be faster or the same. I don't think we need to
mention these details, just giving some background.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]