Re: [PR] Add `infosetWalkerMode` tunable for streaming and non-streaming modes [daffodil]

via GitHub Fri, 29 May 2026 08:41:53 -0700


stevedlawrence commented on code in PR #1676:
URL: https://github.com/apache/daffodil/pull/1676#discussion_r3325189099



##########
daffodil-core/src/main/scala/org/apache/daffodil/runtime1/processors/parsers/SequenceParserBases.scala:
##########
@@ -212,11 +212,6 @@ abstract class SequenceParserBase(
                 // should not increment the group index.
                 pstate.mpstate.moveOverOneGroupIndexOnly()
               }
-
-              // we might have added a new instance to the array. Attempt to 
project it to an
-              // infoset if there are no PoU's or anything blocking it
-              pstate.walker.walk()
-

Review Comment:
   I think we still want these `walker.walk()` calls, they should just be gated 
on the value of the new tunable being streaming. That way if streaming mode is 
enabled we still try walk the infoset in the middle of parsing to output 
whatever parses of the infoset are marked as final.  



##########
daffodil-propgen/src/main/resources/org/apache/daffodil/xsd/dafext.xsd:
##########


Review Comment:
   We might prefix this with "If infosetWalkerMode is "streaming", Daffodil 
periodically.." to make it clear this tunable only applies to streaming mode. 
Same with infosetWalkerSkipMax



##########
daffodil-propgen/src/main/resources/org/apache/daffodil/xsd/dafext.xsd:
##########
@@ -239,6 +239,17 @@
             </xs:restriction>
           </xs:simpleType>
         </xs:element>
+        <xs:element name="infosetWalkerMode" 
type="daf:TunableInfosetWalkerMode" default="nonStreaming" minOccurs="0">
+          <xs:annotation>
+            <xs:documentation>
+              Daffodil can periodically walk the internal infoset to send 
events to the configured
+              InfosetOutputter (streaming) or it can walk the internal infoset 
once at the end of
+              parsing (nonStreaming). The idea being that simple schemas would 
benefit from the
+              nonStreaming infoset walker, while more complex schemas with 
lots of points of
+              uncertaintly would benefit from the streaming infoset walker.

Review Comment:
   I think schemas with PoU's are actual also likely to benefit with 
non-streaming mode, this is because PoU's tend to make it so the infoset walker 
can't do any work because the walker can't walk into something if there's a PoU 
where we *might* backtrack and create a different infoset. So streaming with 
lots of PoUs is likely to just lead to to attempts to walk that don't do 
anything. 
   
   I think  the main situation where someone would want to use streaming mode 
is when the infoset is likely to be very large or when memory is constrained.
   
   Note that this is because the main benefit of streaming mode is that it 
allows parts of the internal infoset that we know won't be changed to be sent 
to the outputter and garbage collected in the middle of a parse, which frees up 
memory while parsing. But if there is no real memory pressure, I imagine in 
most cases non-streaming will be faster or the same. I don't think we need to 
mention these details, just giving some background.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Add `infosetWalkerMode` tunable for streaming and non-streaming modes [daffodil]

Reply via email to