Re: [DISCUSS] split separator work into two phases - to isolate non-backward-compatible behavior change

Steve Lawrence Thu, 09 Aug 2018 10:42:55 -0700

Agreed, 1024 is way to small for plenty of formats.

In fact, what are your thoughts on removing the restriction completely?
I guess the downside is that a too lax schema might cause things to go
off the rails and consume way too much data. But I'd imagine more often
than not this limit is hit because of a huge file. And I've seen
examples where Daffodil backtracks because of it hits the limit and then
goes down a different branch and hide the error about hitting the
occursBoundsMax. If we do keep it, maybe it should be an SDE instead?


On 08/09/2018 12:11 PM, Mike Beckerle wrote:
> Yes a tunable makes sense when there is this sort of compatibility concern.
> 
> 
> There's this other compatibility concern also which is a bit different -
> 
> 
> Another change on my dev branch has actually increased checking of the 
> maximum 
> occurs count - which by default right now is 1024.
> 
> 
> It seems in the past there were code paths that didn't check this, though 
> others 
> did check it. On my branch this is always checked.
> 
> 
> Several tests  were failing because the tests have more than 1024 sized 
> arrays 
> in them that didn't get checked in the past, so I had to add tunable 
> increases 
> of this threshold to their TDML files. Some of these were in DFDLSchemas or 
> fouo 
> schemas on DI2E.net.
> 
> 
> This is easy to do, and once done is portable, but was not expected.
> 
> 
> If people use Daffodil from CLI, or as an embedded library, with a default of 
> 1024 we're basically requiring everyone to tune this max array size tunable.
> 
> 
> I think 1024 is, generally, way too small as a default, and we should 
> increase 
> it to 128K, and let people tune it DOWN to tighten up when their format 
> doesn't 
> use large arrays. But so many formats are a header, and a big array of body 
> records, and perhaps a trailer, and those body arrays are BIG, that this 
> default 
> of 1024 just seems silly to me.
> 
> 
> Comments?
> 
> 
> 
> --------------------------------------------------------------------------------
> *From:* Steve Lawrence <[email protected]>
> *Sent:* Wednesday, August 8, 2018 11:49:46 AM
> *To:* [email protected]; Mike Beckerle
> *Subject:* Re: [DISCUSS] split separator work into two phases - to isolate 
> non-backward-compatible behavior change
> I'm in favor of this.
> 
> Would it be possible to add a tunable to flip the behavior between
> current/broken and new/fixed? That would make for a very clean path
> towards deprecation. We can just warn users for a couple releases that
> the tunable will be flipped at some point and give time for users to
> test schemas with the new tunable. And then when we do decide to
> deprecate the old behavior it's just a matter of flipping the default
> value of the tunable. And people that really don't want the new behavior
> can set it to the old value.
> 
> - Steve
> 
> 
> On 08/08/2018 11:24 AM, Mike Beckerle wrote:
>> Due to this issue: https://issues.apache.org/jira/browse/DAFFODIL-1975
>> 
>> 
>> This is a non-backward-compatible change. The impact of it could be 
>> significant on existing schemas. Depending on scale of the impact we may 
>> have to put in flags to turn on/off the changed behavior.  Love to avoid 
>> that, but the fixes to schemas that are broken  by the DAFFODIL-1975 change 
>> are subtle - one must break up separated sequences 
> into a nest of a few sequences, some separated, some not, to preserve current 
> behavior of the schema.
>> 
>> 
>> I am inclined to NOT fix this as part of the separator and separator 
>> suppression work currently in review as PR  
>> https://github.com/apache/incubator-daffodil/pull/88
>> 
>> 
>> <https://github.com/apache/incubator-daffodil/pull/88>
>> 
>> Rather, my plan is to first commit functionality that is compatible with 
>> current Daffodil behavior, then have separate PRs for individual JIRA 
>> tickets that change behavior in non-backward-compatible ways like 
>> DAFFODIL-1975 does.  This also helps isolate the  tests that verify the 
>> change of behavior, which also serve as a start at 
> tutorial material for explaining the required changes in existing DFDL 
> schemas 
> that use separated sequences.
>> 
>> 
>> There are a number of clarifications in discussion on the DFDL Workgroup 
>> mailing list about behavior of sequences both separated and not, for parsing 
>> and unparsing. In some cases these, like DAFFODIL-1975, may be backwards 
>> incompatible. Several of those could  be combined into one commit/PR, but in 
>> general they should be isolated from 
> other changes if non-backward compatible.
>> 
>> 
>> Any discussion?
>> 
>

Re: [DISCUSS] split separator work into two phases - to isolate non-backward-compatible behavior change

Reply via email to