By this argument, should we also remove JDOM, W3CDOM, and Scala XML infoset outputters from the CLI? These are also effectively API's as well. The CLI just takes the results and converts them to a string for output.

Maybe the CLI only cares about outputting to XML using the XML Text Infoset Outputter, and there shouldn't even be an option for other outputters?

One counter argument, the -I command is useful from a testing perspective to make sure we are building the API objects correctly, with the assumption that to .toString of those objects is correct. But our TDML runner already does that, so maybe we can just rely on those tests and this isn't necessary.

If we do remove the -I option, what are thoughts on keeping the option for the performance subcommand? This is a quick way to test the overhead of different InfosetOutputter's and is something we have done in the past multiple times. Note that for the performance subcommand, the non-text InfosetOutputters don't actually convert to a string, so it only measures the speed to create the API objects.

The exception is with SAX, which does covert the SAX events to text during the performance subcommand (using the DaffodilParseOutputStreamContentHandler). That is probably wrong, since that measures the speed of creating SAX events AND whatever the ContentHandler does, which it most cases will not be the same as what this content handler does. We really just want a measure of how long it takes Daffodil to parse data and convert it to SAX events.

This is further complicated for the performance --unparse option, where the SAX performance includes the time to parse the infoset AND convert to SAX events for Daffodil to unparse. Ideally parsing the infoset would not be included since that is overhead that most SAX implementations won't have, or would at least be outside of Daffodil. One way to resolve this is to preprocess the infoset into a list of "events" (similar to TestInfosetEvent in TestJavaAPI.scala), and use a custom XMLReader that uses that list for creating SAX events to unparse. Adds a bit of complication though.

I'd suggest the changes should be:

1. Remove the -I option from the parse/unparse subcommands only, and always use the XMLTextInfosetInputter/Outputter

2. Keep the -I option the same for the performance subcommand, with these two changes:

2a. Change SAX performance --parse so that it outputs to a "null" content handler

2b. Change SAX performance --unparse so that we preprocess the infoset and convert to a list of "events". A custom XMLReader is used to feed these events to Daffodil for unparsing.


On 7/11/22 3:19 PM, Mike Beckerle wrote:
In the CLI, there is the -I option to specify infoset-type.

One of the choices is 'sax'.

This is a mistake I think. This is really "XML text by way of calling the
SAX API". It's effectively a testing mode for us.

SAX usage is inherently an API.

I believe we should remove this feature from the CLI, because it creates a
lot of confusion. It requires test-mode things to be in the main-libraries
where the CLI can find them.

If we require SAX to be used as it is intended, from applications calling
Daffodil via APIs, then all this "xml text to/from SAX-event" code all ends
up in src/test where it belongs.

Thoughts?


Reply via email to