Re: [jackson-user] decoding JSON payloads with pathological text encodings

mhcomputing Wed, 25 Apr 2018 20:51:15 -0700

On Wednesday, April 25, 2018 at 7:43:44 PM UTC-7, Tatu Saloranta wrote:
>
> At the point where deserializers handle things, decoding has already 
> been done, and 
> information potentially lost and/or corrupt. But if we go down to 
> lower level, decoder (`JsonParser`) 
> is responsible for tokenization, and is in better position. 
>
> I would probably approach this form perspective of using another 
> library to detect encoding 
> and construct `InputStreamReader` for that encoding (library may offer 
> that integration out of the box too), 
> and then use resulting reader for creating parser: 
>
>    JsonParser p = jsonFactory.createStreamReader(reader); 
>
> which may then be given as input source to `ObjectMapper` (or 
> `ObjectReader`). 
>
> Jackson does not really have to know about potential complexity of 
> detecting encoding, and 
> attempting to fix possible Unicode errors. 
>
> -+ Tatu +- 
>


Yes, this would certainly be a preferable solution, if I actually always 
knew what encoding to use for the entire JSON document, but sadly it can 
vary per-String-valued-field. This means that, within a single document, 
there is a possibility that every String could have some different encoding.

So, instead of trying to guess the encoding on the entire raw JSON, I need 
to hook in try and guess the encoding on each String-valued field when 
constructing the String value for the field itself.

So I am trying to understand, what is the right place to intercept the 
creation of the String for every String-valued field? Then I can call the 
encoding guesser, and construct the String or CharSequence for the 
String-valued field myself, where I can do some tricks to un-mangle the 
bytes.

Matthew.

-- 
You received this message because you are subscribed to the Google Groups 
"jackson-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [jackson-user] decoding JSON payloads with pathological text encodings

Reply via email to