On Wednesday, April 25, 2018 at 7:43:44 PM UTC-7, Tatu Saloranta wrote: > > At the point where deserializers handle things, decoding has already > been done, and > information potentially lost and/or corrupt. But if we go down to > lower level, decoder (`JsonParser`) > is responsible for tokenization, and is in better position. > > I would probably approach this form perspective of using another > library to detect encoding > and construct `InputStreamReader` for that encoding (library may offer > that integration out of the box too), > and then use resulting reader for creating parser: > > JsonParser p = jsonFactory.createStreamReader(reader); > > which may then be given as input source to `ObjectMapper` (or > `ObjectReader`). > > Jackson does not really have to know about potential complexity of > detecting encoding, and > attempting to fix possible Unicode errors. > > -+ Tatu +- >
Yes, this would certainly be a preferable solution, if I actually always knew what encoding to use for the entire JSON document, but sadly it can vary per-String-valued-field. This means that, within a single document, there is a possibility that every String could have some different encoding. So, instead of trying to guess the encoding on the entire raw JSON, I need to hook in try and guess the encoding on each String-valued field when constructing the String value for the field itself. So I am trying to understand, what is the right place to intercept the creation of the String for every String-valued field? Then I can call the encoding guesser, and construct the String or CharSequence for the String-valued field myself, where I can do some tricks to un-mangle the bytes. Matthew. -- You received this message because you are subscribed to the Google Groups "jackson-user" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. For more options, visit https://groups.google.com/d/optout.
