stevedlawrence commented on code in PR #812:
URL: https://github.com/apache/daffodil/pull/812#discussion_r923515708


##########
daffodil-cli/src/main/scala/org/apache/daffodil/Main.scala:
##########
@@ -873,8 +883,6 @@ object Main {
 
               val parseResult = eitherOutputterOrHandler match {
                 case Right(saxContentHandler) =>
-                  // reset in case we are streaming
-                  saxContentHandler.reset()

Review Comment:
   Why is this reset removed? Seems like this is important?



##########
daffodil-cli/src/main/scala/org/apache/daffodil/Main.scala:
##########
@@ -1400,8 +1408,15 @@ object Main {
 
   private def unparseWithSAX(
     is: InputStream,
-    contentHandler: DFDL.DaffodilUnparseContentHandler): UnparseResult = {
-    val xmlReader = DaffodilSAXParserFactory().newSAXParser.getXMLReader
+    contentHandler: DFDL.DaffodilUnparseContentHandler,
+    infosetType: InfosetType.Type): UnparseResult = {
+    val xmlReader = infosetType match {
+      case InfosetType.EXI => {
+        val exiSource = new EXISource()
+        exiSource.getXMLReader
+      }
+      case _ => DaffodilSAXParserFactory().newSAXParser.getXMLReader
+    }

Review Comment:
   I imagined that we would have a new `InfosetInputter` and `InfosetOutputter` 
specific to EXI (i.e. `EXIInfosetInputter`/`EXIInfosetOutputter`). That way 
anyone would be able to use this by just specifying the new infosetter without 
having to do all this SAX logic themselves. Is that something we're trying to 
avoid? That would move all of this logic out of the CLI and would make it 
easily usable in the API. It could stil use the SAX InfosetOutputter/Inputter 
under the hood, just makes it easier to use.
   
   Another thought, this is just using our SAXInfosetOutputter and plugging 
that into this API--I wonder if we would get better performance if we directly 
use the EXI API? Converting our internal infoset representation to SAX events 
has some amount of overhead 
([DAFFODIL-2400](https://issues.apache.org/jira/browse/DAFFODIL-2400)) that 
might be avoided if we call the appropriate API calls. Just skimming the code, 
it looks like EXIBodyEncoder is maybe one of the more important API classes and 
is pretty straightforward. It should map pretty closely to our InfosetOuputter 
API:
   
   
https://github.com/EXIficient/exificient-core/blob/master/src/main/java/com/siemens/ct/exi/core/EXIBodyEncoder.java
   
   This also seems especially important in the case of unparsing. Daffodil SAX 
unparsing is extra complicated because Daffodil InfosetInputters are pull-based 
based, but SAX is push-based, so we have this mess of coroutines to link to two 
and it creates a lot of overhead. If we could directly use the EXI API, or at 
least a StAX based API for an EXIInfosetOutputter, I imagine performance would 
be significantly better.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to