By this argument, should we also remove JDOM, W3CDOM, and Scala XML
infoset outputters from the CLI? These are also effectively API's as
well. The CLI just takes the results and converts them to a string for
output.
Maybe the CLI only cares about outputting to XML using the XML Text
Infoset Outputter, and there shouldn't even be an option for other
outputters?
One counter argument, the -I command is useful from a testing
perspective to make sure we are building the API objects correctly, with
the assumption that to .toString of those objects is correct. But our
TDML runner already does that, so maybe we can just rely on those tests
and this isn't necessary.
If we do remove the -I option, what are thoughts on keeping the option
for the performance subcommand? This is a quick way to test the overhead
of different InfosetOutputter's and is something we have done in the
past multiple times. Note that for the performance subcommand, the
non-text InfosetOutputters don't actually convert to a string, so it
only measures the speed to create the API objects.
The exception is with SAX, which does covert the SAX events to text
during the performance subcommand (using the
DaffodilParseOutputStreamContentHandler). That is probably wrong, since
that measures the speed of creating SAX events AND whatever the
ContentHandler does, which it most cases will not be the same as what
this content handler does. We really just want a measure of how long it
takes Daffodil to parse data and convert it to SAX events.
This is further complicated for the performance --unparse option, where
the SAX performance includes the time to parse the infoset AND convert
to SAX events for Daffodil to unparse. Ideally parsing the infoset would
not be included since that is overhead that most SAX implementations
won't have, or would at least be outside of Daffodil. One way to resolve
this is to preprocess the infoset into a list of "events" (similar to
TestInfosetEvent in TestJavaAPI.scala), and use a custom XMLReader that
uses that list for creating SAX events to unparse. Adds a bit of
complication though.
I'd suggest the changes should be:
1. Remove the -I option from the parse/unparse subcommands only, and
always use the XMLTextInfosetInputter/Outputter
2. Keep the -I option the same for the performance subcommand, with
these two changes:
2a. Change SAX performance --parse so that it outputs to a "null"
content handler
2b. Change SAX performance --unparse so that we preprocess the infoset
and convert to a list of "events". A custom XMLReader is used to feed
these events to Daffodil for unparsing.
On 7/11/22 3:19 PM, Mike Beckerle wrote:
In the CLI, there is the -I option to specify infoset-type.
One of the choices is 'sax'.
This is a mistake I think. This is really "XML text by way of calling the
SAX API". It's effectively a testing mode for us.
SAX usage is inherently an API.
I believe we should remove this feature from the CLI, because it creates a
lot of confusion. It requires test-mode things to be in the main-libraries
where the CLI can find them.
If we require SAX to be used as it is intended, from applications calling
Daffodil via APIs, then all this "xml text to/from SAX-event" code all ends
up in src/test where it belongs.
Thoughts?