Re: how hard is encodingErrorPolicy='error' to implement?

Steve Lawrence Mon, 08 Oct 2018 04:27:02 -0700

BitsCharsetDecoder.scala has a section for handling decode errors, but
has logic commented out and replaced with a NotYetImplemented assertion.
I think it's just a matter of having this section throw an encoding
exception and the caller code handling it appropriately.

There are five callers in InputSourceDataInputStream of the decode()
method, so I suspect those will all need to be updated to handle the
exception and do the right thing, which might be just be to let it
bubble up to the parsers.

However, I think there are some subtleties that make decoder scanning
more difficult. For example, delimiter scanning performs lookahead which
I don't think should immediately cause a parser error. I think it should
only cause a parse error when an invalid character is actually read. So
the InputSourceDataInputStreamCharIterator logic probably becomes a bit
more complex to handle lookahead decode errors correctly. I haven't put
too much thought into this though.

And then it's a matter of ensuring the parsers that end up decoding
characters also handle that parse error and start to backtrack, since I
think many of them currently just assume a call to an IO function that
decodes characters will always succeed.

So I don't think it's going to be particularly difficult, but there are
probably some subtleties in some cases, and we really need to inspect
parsers to make sure they are handling it correctly.

I agree the unparsing should not be too difficult for the reasons you've
provided.

- Steve

On 10/3/18 5:15 PM, Mike Beckerle wrote:
> Turns out IBM DFDL implements only encodingErrorPolicy='error', and Daffodil 
> only encodingErrorPolicy='replace'.
> 
> 
> That means for any data where there are encoding errors the two 
> implementations will not behave the same.
> 
> For compatibility testing, this will be problematic.
> 
> 
> The I/O layer was recently revised for parsing to use our own decoders.
> 
> 
> Not sure anything changed about encoders.
> 
> 
> How hard is implementing parse-time encodingErrorPolicy='error', in Daffodil, 
> which just raises a parse error if a decode error occurs?
> 
> 
> I know for unparsing, if we're using java encoders, the implementation of 
> encodingErrorPolicy='error' just requires initializing all encoders to have 
> malformed and unmapped error handlers that throw. Then catching this throw 
> and converting to an unparse error is all that is required. This has little 
> or no performance implications as unparse errors are fatal.
> 
> 
>

Re: how hard is encodingErrorPolicy='error' to implement?

Reply via email to