[
https://issues.apache.org/jira/browse/DAFFODIL-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Steve Lawrence updated DAFFODIL-2684:
-------------------------------------
Fix Version/s: (was: 3.4.0)
> daffodil-cli splitParse mode
> ----------------------------
>
> Key: DAFFODIL-2684
> URL: https://issues.apache.org/jira/browse/DAFFODIL-2684
> Project: Daffodil
> Issue Type: New Feature
> Components: CLI
> Affects Versions: 3.3.0
> Reporter: Mike Beckerle
> Priority: Major
> Labels: beginner
>
> A common way Daffodil is used involves first splitting data off of a TCP
> stream or other input stream, and then handing each split (a byte array) to
> Daffodil to parse a single message.
> This differs from the current CLI "streaming" mode in the way errors work.
> The existing streaming can't tolerate errors. Any error halts parsing the
> entire stream. The only way to parse an entire stream that includes a mixture
> of correct and malformed data is to use a DFDL schema which actually accepts
> even malformed data, creating elements from it. (E.g.,
> <invalid>8929AFB3892</invalid> )
> But this is unnatural and adds complexity to the DFDL schema that wouldn't
> otherwise be needed.
> The split-and-parse method can continue to parse the next message even after
> a failure to parse. The only thing that is fatal to the whole processing run
> is if it is not possible to meaningfully split the message from the data
> stream.
> So we want a split-and-parse capability in the CLI. Such mode uses two DFDL
> schemas, a splitter schema (very simple), and a regular parse schema. The
> splitter schema just does the minimum to split a message from the stream,
> then parses the byte-array it gets from the split, and parses that.
> There is no real unparser symmetric equivalent of this split-and-parse
> behavior. Regular streaming unparsing works.
> The prototype of this idea is on github openDFDL examples repo splitAndParse
> subdir/project. This is 100% code authored by mbeckerle (Daffodil PMC)
> intended to contribute to Daffodil, so no issue pulling it, or parts of it
> into Daffodil.
> Suggest command line like this:
> {code:java}
> daffodil parse --stream --splitterSchema filename ... other options as per
> parse. {code}
> When --stream is specified, the --splitterSchema option is available. If used
> it provides the file name of a splitter DFDL schema.
> If the splitter DFDL schema is precompiled then the options would be
> {code:java}
> daffodil parse --stream --splitterParser binaryfilename ... other options as
> per parse. {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)