[ 
https://issues.apache.org/jira/browse/DAFFODIL-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Lawrence updated DAFFODIL-2684:
-------------------------------------
    Fix Version/s:     (was: 3.4.0)

> daffodil-cli splitParse mode
> ----------------------------
>
>                 Key: DAFFODIL-2684
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-2684
>             Project: Daffodil
>          Issue Type: New Feature
>          Components: CLI
>    Affects Versions: 3.3.0
>            Reporter: Mike Beckerle
>            Priority: Major
>              Labels: beginner
>
> A common way Daffodil is used involves first splitting data off of a TCP 
> stream or other input stream, and then handing each split (a byte array) to 
> Daffodil to parse a single message. 
> This differs from the current CLI "streaming" mode in the way errors work. 
> The existing streaming can't tolerate errors. Any error halts parsing the 
> entire stream. The only way to parse an entire stream that includes a mixture 
> of correct and malformed data is to use a DFDL schema which actually accepts 
> even malformed data, creating elements from it. (E.g., 
> <invalid>8929AFB3892</invalid> ) 
> But this is unnatural and adds complexity to the DFDL schema that wouldn't 
> otherwise be needed. 
> The split-and-parse method can continue to parse the next message even after 
> a failure to parse. The only thing that is fatal to the whole processing run 
> is if it is not possible to meaningfully split the message from the data 
> stream. 
> So we want a split-and-parse capability in the CLI. Such mode uses two DFDL 
> schemas, a splitter schema (very simple), and a regular parse schema. The 
> splitter schema just does the minimum to split a message from the stream, 
> then parses the byte-array it gets from the split, and parses that. 
> There is no real unparser symmetric equivalent of this split-and-parse 
> behavior. Regular streaming unparsing works. 
> The prototype of this idea is on github openDFDL examples repo splitAndParse 
> subdir/project. This is 100% code authored by mbeckerle (Daffodil PMC) 
> intended to contribute to Daffodil, so no issue pulling it, or parts of it 
> into Daffodil. 
> Suggest command line like this:
> {code:java}
> daffodil parse --stream --splitterSchema filename ... other options as per 
> parse. {code}
> When --stream is specified, the --splitterSchema option is available. If used 
> it provides the file name of a splitter DFDL schema. 
> If the splitter DFDL schema is precompiled then the options would be
> {code:java}
> daffodil parse --stream --splitterParser binaryfilename ... other options as 
> per parse. {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to