Agreed, 1024 is way to small for plenty of formats. In fact, what are your thoughts on removing the restriction completely? I guess the downside is that a too lax schema might cause things to go off the rails and consume way too much data. But I'd imagine more often than not this limit is hit because of a huge file. And I've seen examples where Daffodil backtracks because of it hits the limit and then goes down a different branch and hide the error about hitting the occursBoundsMax. If we do keep it, maybe it should be an SDE instead?
On 08/09/2018 12:11 PM, Mike Beckerle wrote: > Yes a tunable makes sense when there is this sort of compatibility concern. > > > There's this other compatibility concern also which is a bit different - > > > Another change on my dev branch has actually increased checking of the > maximum > occurs count - which by default right now is 1024. > > > It seems in the past there were code paths that didn't check this, though > others > did check it. On my branch this is always checked. > > > Several tests were failing because the tests have more than 1024 sized > arrays > in them that didn't get checked in the past, so I had to add tunable > increases > of this threshold to their TDML files. Some of these were in DFDLSchemas or > fouo > schemas on DI2E.net. > > > This is easy to do, and once done is portable, but was not expected. > > > If people use Daffodil from CLI, or as an embedded library, with a default of > 1024 we're basically requiring everyone to tune this max array size tunable. > > > I think 1024 is, generally, way too small as a default, and we should > increase > it to 128K, and let people tune it DOWN to tighten up when their format > doesn't > use large arrays. But so many formats are a header, and a big array of body > records, and perhaps a trailer, and those body arrays are BIG, that this > default > of 1024 just seems silly to me. > > > Comments? > > > > -------------------------------------------------------------------------------- > *From:* Steve Lawrence <[email protected]> > *Sent:* Wednesday, August 8, 2018 11:49:46 AM > *To:* [email protected]; Mike Beckerle > *Subject:* Re: [DISCUSS] split separator work into two phases - to isolate > non-backward-compatible behavior change > I'm in favor of this. > > Would it be possible to add a tunable to flip the behavior between > current/broken and new/fixed? That would make for a very clean path > towards deprecation. We can just warn users for a couple releases that > the tunable will be flipped at some point and give time for users to > test schemas with the new tunable. And then when we do decide to > deprecate the old behavior it's just a matter of flipping the default > value of the tunable. And people that really don't want the new behavior > can set it to the old value. > > - Steve > > > On 08/08/2018 11:24 AM, Mike Beckerle wrote: >> Due to this issue: https://issues.apache.org/jira/browse/DAFFODIL-1975 >> >> >> This is a non-backward-compatible change. The impact of it could be >> significant on existing schemas. Depending on scale of the impact we may >> have to put in flags to turn on/off the changed behavior. Love to avoid >> that, but the fixes to schemas that are broken by the DAFFODIL-1975 change >> are subtle - one must break up separated sequences > into a nest of a few sequences, some separated, some not, to preserve current > behavior of the schema. >> >> >> I am inclined to NOT fix this as part of the separator and separator >> suppression work currently in review as PR >> https://github.com/apache/incubator-daffodil/pull/88 >> >> >> <https://github.com/apache/incubator-daffodil/pull/88> >> >> Rather, my plan is to first commit functionality that is compatible with >> current Daffodil behavior, then have separate PRs for individual JIRA >> tickets that change behavior in non-backward-compatible ways like >> DAFFODIL-1975 does. This also helps isolate the tests that verify the >> change of behavior, which also serve as a start at > tutorial material for explaining the required changes in existing DFDL > schemas > that use separated sequences. >> >> >> There are a number of clarifications in discussion on the DFDL Workgroup >> mailing list about behavior of sequences both separated and not, for parsing >> and unparsing. In some cases these, like DAFFODIL-1975, may be backwards >> incompatible. Several of those could be combined into one commit/PR, but in >> general they should be isolated from > other changes if non-backward compatible. >> >> >> Any discussion? >> >
