Re: [DISCUSS] split separator work into two phases - to isolate non-backward-compatible behavior change

Mike Beckerle Fri, 10 Aug 2018 01:38:10 -0700

I think the limit should default to Long.MaxValue, but can be set lower by 
users.



SDE vs. PE is interesting.


I think this limit is designed to prevent the parser from being DoS-ed, or 
wasting your time when debugging your schema.


Suppose we make it an SDE.


If people want to insure the arrays are less than N elements and get a PE, they 
can put a


<dfdl:assert>{ dfdl:occursIndex() <= $limit }</dfdl:assert>


and bind $limit in a variable that can be externally set, or just hard-code the 
constant.


An unparse error is, however, fatal, so there's not difference at unparse time, 
practically speaking, between SDE and PE.


Theoretically, I suppose a schema might want to detect exceeding the limit and 
backtrack and try another choice branch, but that seems to be rather heroic for 
a very coarse thing like this that is tunable with only 1 value over an entire 
schema.


I guess I'm leaning towards error is an SDE, default value is Long.MaxValue 
(unlimited) - so the error will never occur, and applications can tune it lower 
to get the SDE if they want to.


________________________________
From: Steve Lawrence <[email protected]>
Sent: Thursday, August 9, 2018 1:42:25 PM
To: Mike Beckerle; [email protected]
Subject: Re: [DISCUSS] split separator work into two phases - to isolate 
non-backward-compatible behavior change

Agreed, 1024 is way to small for plenty of formats.

In fact, what are your thoughts on removing the restriction completely?
I guess the downside is that a too lax schema might cause things to go
off the rails and consume way too much data. But I'd imagine more often
than not this limit is hit because of a huge file. And I've seen
examples where Daffodil backtracks because of it hits the limit and then
goes down a different branch and hide the error about hitting the
occursBoundsMax. If we do keep it, maybe it should be an SDE instead?

On 08/09/2018 12:11 PM, Mike Beckerle wrote:
> Yes a tunable makes sense when there is this sort of compatibility concern.
>
>
> There's this other compatibility concern also which is a bit different -
>
>
> Another change on my dev branch has actually increased checking of the maximum
> occurs count - which by default right now is 1024.
>
>
> It seems in the past there were code paths that didn't check this, though 
> others
> did check it. On my branch this is always checked.
>
>
> Several tests  were failing because the tests have more than 1024 sized arrays
> in them that didn't get checked in the past, so I had to add tunable increases
> of this threshold to their TDML files. Some of these were in DFDLSchemas or 
> fouo
> schemas on DI2E.net.
>
>
> This is easy to do, and once done is portable, but was not expected.
>
>
> If people use Daffodil from CLI, or as an embedded library, with a default of
> 1024 we're basically requiring everyone to tune this max array size tunable.
>
>
> I think 1024 is, generally, way too small as a default, and we should increase
> it to 128K, and let people tune it DOWN to tighten up when their format 
> doesn't
> use large arrays. But so many formats are a header, and a big array of body
> records, and perhaps a trailer, and those body arrays are BIG, that this 
> default
> of 1024 just seems silly to me.
>
>
> Comments?
>
>
>
> --------------------------------------------------------------------------------
> *From:* Steve Lawrence <[email protected]>
> *Sent:* Wednesday, August 8, 2018 11:49:46 AM
> *To:* [email protected]; Mike Beckerle
> *Subject:* Re: [DISCUSS] split separator work into two phases - to isolate
> non-backward-compatible behavior change
> I'm in favor of this.
>
> Would it be possible to add a tunable to flip the behavior between
> current/broken and new/fixed? That would make for a very clean path
> towards deprecation. We can just warn users for a couple releases that
> the tunable will be flipped at some point and give time for users to
> test schemas with the new tunable. And then when we do decide to
> deprecate the old behavior it's just a matter of flipping the default
> value of the tunable. And people that really don't want the new behavior
> can set it to the old value.
>
> - Steve
>
>
> On 08/08/2018 11:24 AM, Mike Beckerle wrote:
>> Due to this issue: https://issues.apache.org/jira/browse/DAFFODIL-1975
>>
>>
>> This is a non-backward-compatible change. The impact of it could be 
>> significant on existing schemas. Depending on scale of the impact we may 
>> have to put in flags to turn on/off the changed behavior.  Love to avoid 
>> that, but the fixes to schemas that are broken  by the DAFFODIL-1975 change 
>> are subtle - one must break up separated sequences
> into a nest of a few sequences, some separated, some not, to preserve current
> behavior of the schema.
>>
>>
>> I am inclined to NOT fix this as part of the separator and separator 
>> suppression work currently in review as PR  
>> https://github.com/apache/incubator-daffodil/pull/88
>>
>>
>> <https://github.com/apache/incubator-daffodil/pull/88>
>>
>> Rather, my plan is to first commit functionality that is compatible with 
>> current Daffodil behavior, then have separate PRs for individual JIRA 
>> tickets that change behavior in non-backward-compatible ways like 
>> DAFFODIL-1975 does.  This also helps isolate the  tests that verify the 
>> change of behavior, which also serve as a start at
> tutorial material for explaining the required changes in existing DFDL schemas
> that use separated sequences.
>>
>>
>> There are a number of clarifications in discussion on the DFDL Workgroup 
>> mailing list about behavior of sequences both separated and not, for parsing 
>> and unparsing. In some cases these, like DAFFODIL-1975, may be backwards 
>> incompatible. Several of those could  be combined into one commit/PR, but in 
>> general they should be isolated from
> other changes if non-backward compatible.
>>
>>
>> Any discussion?
>>
>

Re: [DISCUSS] split separator work into two phases - to isolate non-backward-compatible behavior change

Reply via email to