stevedlawrence commented on code in PR #1572:
URL: https://github.com/apache/daffodil/pull/1572#discussion_r2429654696
##########
daffodil-core/src/main/scala/org/apache/daffodil/runtime1/infoset/SAXInfosetInputter.scala:
##########
@@ -126,9 +126,14 @@ class SAXInfosetInputter(
}
override def hasNext(): Boolean = {
- // If we haven't reached an EndDocument event yet, there must be more
- // events on their way, even if we don't know for sure yet.
- currentEvent.eventType.get ne EndDocument
+ // If we haven't reached an EndDocument event yet, there must be more
events on their way,
Review Comment:
The `EndDocument` SAX event isn't so much about that last *element*, but
about reaching the end of the XML source and telling the ContentHandler that
there won't be any more events. So even if there are comments, PI, etc. after
the last element, we won't get the `EndDocument` event until after those are
all processed. And the Daffodil unparser also won't finish uparsing until it
gets the EndDocument event from the InfosetInputter as well. So even if the
XMLReader does parse the last element, but then sees an invalid comment or
something, then it will error, we'll never see EndDocument, and we have the
same issue.
The DaffodilUnparseContentHandler does disregard a number of things, like PI
and ignoble whitespace:
https://github.com/apache/daffodil/blob/1a51c31236fb22798b43ee84781a54416864bb47/daffodil-core/src/main/scala/org/apache/daffodil/runtime1/processors/DaffodilUnparseContentHandlerImpl.scala#L459-L473
Also, I just learned that the SAX `ContentHandler` API does not even receive
comments--you need a different `LexicalHandler` if you care about those. So
that's why comments aren't in our `ContentHandler` anywhere.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]