Mike Beckerle created DAFFODIL-2684:
---------------------------------------

             Summary: daffodil-cli splitParse mode
                 Key: DAFFODIL-2684
                 URL: https://issues.apache.org/jira/browse/DAFFODIL-2684
             Project: Daffodil
          Issue Type: Bug
          Components: CLI
    Affects Versions: 3.3.0
            Reporter: Mike Beckerle


A common way Daffodil is used involves first splitting data off of a TCP stream 
or other input stream, and then handing each split (a byte array) to Daffodil 
to parse a single message. 

This differs from the current CLI "streaming" mode in the way errors work. The 
existing streaming can't tolerate errors. Any error halts parsing the entire 
stream. The only way to parse an entire stream that includes a mixture of 
correct and malformed data is to use a DFDL schema which actually accepts even 
malformed data, creating elements from it. (E.g., 
<invalid>8929AFB3892</invalid> ) 

But this is unnatural and adds complexity to the DFDL schema that wouldn't 
otherwise be needed. 

The split-and-parse method can continue to parse the next message even after a 
failure to parse. The only thing that is fatal to the whole processing run is 
if it is not possible to meaningfully split the message from the data stream. 

So we want a split-and-parse capability in the CLI. Such mode uses two DFDL 
schemas, a splitter schema (very simple), and a regular parse schema. The 
splitter schema just does the minimum to split a message from the stream, then 
parses the byte-array it gets from the split, and parses that. 

There is no real unparser symmetric equivalent of this split-and-parse 
behavior. Regular streaming unparsing works. 

The prototype of this idea is on github openDFDL examples repo splitAndParse 
subdir/project. This is 100% code authored by mbeckerle (Daffodil PMC) intended 
to contribute to Daffodil, so no issue pulling it, or parts of it into 
Daffodil. 

Suggest command line like this:
{code:java}
daffodil parse --stream --splitterSchema filename ... other options as per 
parse. {code}
When --stream is specified, the --splitterSchema option is available. If used 
it provides the file name of a splitter DFDL schema. 

If the splitter DFDL schema is precompiled then the options would be
{code:java}
daffodil parse --stream --splitterParser binaryfilename ... other options as 
per parse. {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to