[GitHub] [incubator-daffodil] stevedlawrence commented on a change in pull request #437: Free infoset element during unparse

GitBox Thu, 15 Oct 2020 12:33:19 -0700


stevedlawrence commented on a change in pull request #437:
URL: https://github.com/apache/incubator-daffodil/pull/437#discussion_r505789562




##########
File path: 
daffodil-runtime1-unparser/src/main/scala/org/apache/daffodil/processors/unparsers/ElementUnparser.scala
##########
@@ -417,39 +418,56 @@ sealed trait RegularElementUnparserStartEndStrategy
             UnparseError(Nope, One(state.currentLocation), "Expected element 
start event for %s, but received %s.",
               erd.namedQName.toExtendedSyntax, event)
           }
-          val res = event.info.element
-          val mCurNode = state.currentInfosetNodeMaybe
-          if (mCurNode.isDefined) {
-            val c = mCurNode.get.asComplex
-            Assert.invariant(!c.isFinal)
-            if (c.maybeIsNilled == MaybeBoolean.True) {
-              // cannot add content to a nilled complex element
-              UnparseError(One(erd.schemaFileLocation), Nope, "Nilled complex 
element %s has content from %s",
-                c.erd.namedQName.toExtendedSyntax,
-                res.erd.namedQName.toExtendedSyntax)
-            }
-            c.addChild(res, state.tunable)
-          } else {
-            val doc = state.documentElement
-            doc.addChild(res, state.tunable) // DIDocument, which is never a 
current node, must have the child added
-            doc.isFinal = true // that's the only child.
-          }
-          res
+          event.info.element
         } else {
           Assert.invariant(state.withinHiddenNest)
           // Since we never get events for elements in hidden contexts, their 
infoset elements
           // will have never been created. This means we need to manually 
create them
-          val e = if (erd.isComplexType) new DIComplex(erd) else new 
DISimple(erd)
-          e.setHidden()
-          state.currentInfosetNode.asComplex.addChild(e, state.tunable)
-          e
+          val hiddenElem = if (erd.isComplexType) new DIComplex(erd) else new 
DISimple(erd)
+          hiddenElem.setHidden()
+          hiddenElem
+        }
+
+      // now add this new elem to the infoset
+      val parentNodeMaybe = state.currentInfosetNodeMaybe
+      if (parentNodeMaybe.isDefined) {
+        val parentComplex = parentNodeMaybe.get.asComplex
+        Assert.invariant(!parentComplex.isFinal)
+        if (parentComplex.maybeIsNilled == MaybeBoolean.True) {
+          // cannot add content to a nilled complex element
+          UnparseError(One(erd.schemaFileLocation), Nope, "Nilled complex 
element %s has content from %s",
+            parentComplex.erd.namedQName.toExtendedSyntax,
+            newElem.erd.namedQName.toExtendedSyntax)
         }
 
+        // We are about to add a child to this complex element. Before we do
+        // that, if the last child added to this complex is a DIArray, and this
+        // new child isn't part of that array, that implies that the DIArray
+        // will have no more children added and should be marked as final, and

Review comment:
       I'd have to do some testing, but I think this would actualy stream 
somewhat okay. The PCAP schema essentially looks like this:
   ```xml
   <PCAP>
     <GlobalHeader>...<GlobalHeader>
     <Packet>...</Packet>
     ...
     <Packet>...</Packet>
   </PCAP>
   ```
   Named down steps are the only thing that cause elements to be marked as 
"referenced by an expression" and thus not freeable. And the PCAP schema, 
nothing actually references "Packet" in a down step. Lots of children of the 
Packet element do get referenced, but not the Packet itself. So when a Packet 
element is marked as isFinal, we will immediately free it. So I suspect we'll 
only ever have one Packet (and all of it's children) in memory at a time. Which 
should be very reasonble.
   
   This does exlucding things like suspensions which might hold on to packets, 
but hopefully the earlier suspension evaluation will mitigate that.
   
   This also doesn't handle the fact the there there is going to be a DIArray 
with size N where N is the number of packets. Most of the elements in that 
array will be null because they'll be freed, but it will still take up a good 
chunk of memory for large arrays. Note that a schema could minimize this a bit 
by "chunking" the array. E.g.
   ```xml
   <xs:element name="PacketChunk" maxOccurs="unbounded">
     <xs:complexType>
       <xs:sequence>
         <xs:element name="Packet" maxOccurs="10000" ... />
       </xs:sequence>
     </xs:complexType>
   </xs:element>
   ```
   It's a bit ugly, but is a potential work around for very large arrays 
without having to compact them.
   
   Note that we could add a feature to compact them after they grow to some 
size and have enough elements freed, but there are things like fn:count() and 
index offsets that need to be considered, so if we do want to do that, it's 
probably best left for another change.
   
   I'm also not argueing that the extension to hint to clean up arrays isn't 
necessary. But I do think that this change does resolve some streaming issues 
without requiring additional extension properties be added.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [incubator-daffodil] stevedlawrence commented on a change in pull request #437: Free infoset element during unparse

Reply via email to