Mike Beckerle created DAFFODIL-2684:
---------------------------------------
Summary: daffodil-cli splitParse mode
Key: DAFFODIL-2684
URL: https://issues.apache.org/jira/browse/DAFFODIL-2684
Project: Daffodil
Issue Type: Bug
Components: CLI
Affects Versions: 3.3.0
Reporter: Mike Beckerle
A common way Daffodil is used involves first splitting data off of a TCP stream
or other input stream, and then handing each split (a byte array) to Daffodil
to parse a single message.
This differs from the current CLI "streaming" mode in the way errors work. The
existing streaming can't tolerate errors. Any error halts parsing the entire
stream. The only way to parse an entire stream that includes a mixture of
correct and malformed data is to use a DFDL schema which actually accepts even
malformed data, creating elements from it. (E.g.,
<invalid>8929AFB3892</invalid> )
But this is unnatural and adds complexity to the DFDL schema that wouldn't
otherwise be needed.
The split-and-parse method can continue to parse the next message even after a
failure to parse. The only thing that is fatal to the whole processing run is
if it is not possible to meaningfully split the message from the data stream.
So we want a split-and-parse capability in the CLI. Such mode uses two DFDL
schemas, a splitter schema (very simple), and a regular parse schema. The
splitter schema just does the minimum to split a message from the stream, then
parses the byte-array it gets from the split, and parses that.
There is no real unparser symmetric equivalent of this split-and-parse
behavior. Regular streaming unparsing works.
The prototype of this idea is on github openDFDL examples repo splitAndParse
subdir/project. This is 100% code authored by mbeckerle (Daffodil PMC) intended
to contribute to Daffodil, so no issue pulling it, or parts of it into
Daffodil.
Suggest command line like this:
{code:java}
daffodil parse --stream --splitterSchema filename ... other options as per
parse. {code}
When --stream is specified, the --splitterSchema option is available. If used
it provides the file name of a splitter DFDL schema.
If the splitter DFDL schema is precompiled then the options would be
{code:java}
daffodil parse --stream --splitterParser binaryfilename ... other options as
per parse. {code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)