On Mon, May 14, 2018 at 7:32 PM,  <[email protected]> wrote:
> On Monday, April 30, 2018 at 5:04:40 PM UTC-7, Tatu Saloranta wrote:
>>
>> Factory methods are called by ObjectMapper and ObjectReader; those are
>> probably the best
>> examples. It is possible to only overload some of internal methods
>> that these factory methods
>> delegate to (2 or 3, instead of a dozen).
>>
>> But it sounds like you would then also need to override many other
>> accessors for textual data,
>> and/or `nextToken()` and other methods.
>>
>> -+ Tatu +-
>
>
> Dear Tatu,
>
> Your advice about this bizarre challenge was absolutely invaluable and in
> the end it worked very nicely. I've managed to create a blasphemous JSON
> parser, which is able to properly decode illegal inputs like this:
>
> { "key1_in_utf8": "value1_in_weird_encoding_1", "key2_in_utf8":
> "value2_in_weird_encoding_2", ... }
>
>
> I have attached this special parser here, in case you wanted to see how it
> works, and especially if you might have any feedback on it. This code could
> be used as a basis for a future Jackson class or extra / bonus module, which
> could be used to handle JSON with different text encoding bugs:
>
> 1) the more common case of JSON in a wrong encoding (by running the
> detection at a low level like ByteSourceJsonBootstrapper does, except
> supporting a much bigger number of encodings, but at a cost of being more
> ugly and slow)
>
> 2) the rare, and never before seen (by me at least), JSON with a mixed wrong
> encoding (which is what this does now, but at a cost of being exponentially
> more ugly and slow of course)
>
> The parser depends on a popular open-source encoding detection library in
> Java: https://github.com/albfernandez/juniversalchardet , which is licensed
> with MPL 1.1, GPL 2 or later, LGPL 2.1 or later, which should be pretty
> compatible with the licensing on Jackson for downstream users who need this.
>
> Hopefully this effort will assist other users who run into similar issues
> with insane JSONs.

Very cool. Thank you for sharing it -- who knows? There are many
lenient libs for HTML
(TagSoup et al), so perhaps there is need. And the first step often is
to have source material
to peek into and maybe get ideas of how a new approach might work.

-+ Tatu +-


>
> Matthew.
>
> --
> You received this message because you are subscribed to the Google Groups
> "jackson-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"jackson-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to