Forwarding discussion on this.
________________________________
From: Stephen Lawrence
Sent: Tuesday, October 24, 2017 11:20 AM
To: Taylor Wise
Subject: Re: Unordered Sequences Thoughts
On 10/24/2017 10:54 AM, Taylor Wise wrote:
> <sequence sequenceKind="unordered">
> <a minOccurs="0" maxOccurs="unbounded" initiator="A:"/>
> <b minOccurs="0" maxOccurs="unbounded" initiator="B:"/>
> <c minOccurs="0" maxOccurs="unbounded" initiator="C:" outputValueCalc="{
> count( ../b) }"/>
> </sequence>
>
> A:1B:2A:3A:4B:5B:6C:check
> <a>1</a><b>2</b><a>3</a><a>4</a><b>5</b><b>6</b><c>3</c>
>
> A:1B:2C:checkB:3C:check
> <a>1</a><b>2</b><c>1</c><b>3</b><c>2</c>
Do you mean inputValueCalc instead of outputValueCalc? If so these both
look correct to me, and should work by sorting at the end. We should
also be able to do something like
inputValueCalc="{ ../b[fn:count(../b) - 1] }"
or something to get the last b. Both of these will require query style
expression support (i.e. when the Seq() in the hash table contains more
than one element) though.
>
> I suppose that the above is legal. Right? So I guess there's no issue with
> expressions. But if you were to say instead of doing an outputValueCalc, but
> do
> an Nth child lookup (however that is written), 'c' at each point would
> eventually be outdated and could lead to undesirable results. And of course,
> like we mentioned in the meeting, people would be thinking of Nth child in the
> 'sorted' representation vs the insertion representation.
Agreed with this. If we supported Nth-child, things break down. BTW, to
do Nth child would be something like this:
/path/to/parent/*[position() = N]
DFDL does not support * in path steps or the position() function. I
suspect it will never support * since that breaks static type checking.
> Oooh.. question.
>
> <sequence sequenceKind="unordered">
> <a minOccurs="0" maxOccurs="unbounded" initiator="A:"/>
> <c minOccurs="0" maxOccurs="unbounded" initiator="C:" outputValueCalc="{
> count( ../b) }"/>
> <b minOccurs="0" maxOccurs="unbounded" initiator="B:"/>
> </sequence>
>
> Is that now illegal? I'm asking for the value of something that could
> potentially come after me or not at all.
Good question. I think that should be allowed, but I can't find anything
in the spec that says one way or the other. Note that outputValueCalc is
actually a bit weird, since on unparse the infoset will be sorted, so
everytime count(../b) is executed, it will always be the same.
inputValueCalc will be different though since that is executed before
things are parsed.
>
> --------------------------------------------------------------------------------
> *From:* Stephen Lawrence
> *Sent:* Tuesday, October 24, 2017 10:08:35 AM
> *To:* Taylor Wise
> *Subject:* Re: Unordered Sequences Thoughts
> Not sure I understand. Nth child isn't even part of the DFDL
> specification, seems perfectly reasonable to have an expression that
> works for all cases except for that. Maybe I'm misunderstanding?
>
> On 10/24/2017 10:03 AM, Taylor Wise wrote:
>> I think for the time being, it's probably best to just sort at the end.
>> Then at
>> least we have unordered sequences re-enabled and can make a decision at a
>> later
>> date if we want to pursue the added benefit of finding the n'th item. But, I
>> would argue that an unordered sequence isn't 'finished' until it has been
>> completed. And so you shouldn't be able to run expressions on it until then.
>>
>> --------------------------------------------------------------------------------
>> *From:* Stephen Lawrence
>> *Sent:* Tuesday, October 24, 2017 8:18:29 AM
>> *To:* Taylor Wise
>> *Subject:* Unordered Sequences Thoughts
>> Taylor,
>>
>> I was thinking about the solutions that we brainstormed about unordered
>> sequences last week. I think the solutions were essentially to create a
>> new UnorderedSequence parser combinator and it can keep track of where
>> the infoset elements start and end and so it can handle sorting the
>> infoset elements, somehow.
>>
>> Two thoughts came up on how to do this:
>>
>> 1. Insert everything as normal. Upon completion of the new parser
>> combinator, it sorts all the elements that were added based on the
>> schema order. It might need to make a copy of the pre-sorted elements in
>> case they need to be restored do to backtracking (though, i'm not sure
>> this is necessary, I *think* once an unordered sequence is completed,
>> there's no way to backtrack to the middle of it. This *might* be wrong
>> though). The downside with this is that it doesn't work well with n-th
>> child, though this isn't really a requirement and changes a lot of
>> assumptions (for example, static type checking when compiling schemas
>> would be useless) so I'm not sure it really matters.
>>
>> 2. Insert new elements in the sorted position. This is essentially
>> "insertion sort". One drawback of this (and maybe a dealbreaker?) is
>> that we currently look at the end of the list to determine if we are
>> added to an existing array. So I *think* we need to maintain insertion
>> order so that we can append to the correct DIArray. There might be a way
>> around this, but I'm not sure.
>>
>> Point being, before you get to far into implementing unodered sequences,
>> make sure you think through this issue. I'm not immediately sure of a
>> workaround, and sorting at the end might be easier to implement, and
>> might actually be more efficient. Insertion sort isn't particularly
>> efficient, especially for large out of order lists. Sorting at the end
>> would allow for some other algorithm that might be better.
>>
>> Just some food for thought.
>>
>> Also, semi-related. I took a look at the final slot chnages. Looks
>> really good! Only thing to work on in the future is a commit message.
>> You breifly explain what changes were made, but the more important thing
>> to have in a commit message is "why" the changes were made. When someone
>> looks back on this code in a year, it's easy to figure out what was
>> changed, but the reasoning behind it may not be as clear, and that's
>> usually the question someone is asking when they are looking at really
>> old code.
>>
>> - Steve
>>
>