[ 
https://issues.apache.org/jira/browse/DAFFODIL-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18046691#comment-18046691
 ] 

Mike Beckerle commented on DAFFODIL-2831:
-----------------------------------------

That's a good observation. We could ball up all the initiators of any length 
into a big list of them and create a scanner that will scan for any of them. 
Then assign a unique integer based on which one was found, then use the 
choice-by-dispatch like mechanism given the integer to select a branch. This is 
then independent of whether the initiators are fixed or variable length, 
whether some branches have multiple initiators as alternatives, etc. It even 
allows initiators to have the various wildcards like '%WSP*;' in them. 

One issue is making sure this still has sequential-order semantics. I.e., you 
can't get a longer match to a delimiter expressed in a later branch rather than 
a shorter match to an earlier branch. The spec says the behavior is equivalent 
to the branches being tried one by one in sequence. 

So if an earlier branch has dfdl:initiator="A"

a later branch has dfdl:initiator="AA"

The data is "AA123"

Then DFDL semantics is the first branch must win even though the later branch 
has a longer match, and this is true even if using dfdl:initiatedContent="yes". 

The longest match behavior is only within the various initiators for *one* 
dfdl:initiator property, not across branches.  I.e., if dfdl:initiator="A AA" 
then it should find the "AA" and not just stop with "A". 

This could be handled by verifying that no delimiter across branches can be a 
prefix of a later delimiter. That's not an easy check if the delimiters are 
full of wildcards and such, but one need not get a positive answer to this 
question. It is either known that no earlier branch has an initiator that is a 
prefix of a later one, or it is unknown, and we can only do the optimization if 
it is known.  

I would want a setting/property indicating that you do not want backtracking, 
and if the optimization can't be done, to cause an SDE.

> InitiatedContent performance isn't equivalent to choicedispatchkey performance
> ------------------------------------------------------------------------------
>
>                 Key: DAFFODIL-2831
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-2831
>             Project: Daffodil
>          Issue Type: Improvement
>          Components: Middle "End", Performance
>    Affects Versions: 3.5.0
>            Reporter: Olabusayo Kilo
>            Priority: Major
>
> One should achieve similar performance by an optimization of 
> initiatedContent="yes". Having to do a by-hand optimization/workaround of 
> choiceDispatchKey, with the corresponding ugly outputValueCalc, is definitely 
> to be avoided.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to