Hi,

I looked into it and this seems to be a rather interesting situation :)

First of all, this does not relate to Jackson and I don't think there is a
regression in NiFi versions. I think you may have received specific data
for the first time that is causing this issue. Are you absolutely sure that
the exact same data is working in NiFi 1.23?

It is very easy to simulate the problem in a unit test on the NiFi side and
the problem appears as soon as you're using a JSON Reader and JSON Writer
with the data that you shared.

Specifically the unicode character \uD83D is problematic because, according
to this RFC [1]:
"The definition of UTF-8 prohibits encoding character numbers between
U+D800 and U+DFFF, which are reserved for use with the UTF-16 encoding form
(as surrogate pairs) and do not directly represent characters."

So it looks like you need UTF-16 when writing data - currently we use UTF-8
as the default. Do you have more details on how this data is generated?
Is {\"name\":\"\uD83D*\"} the exact data you're receiving? or do you have
additional characters in that field? If yes, can you give the full content
of the name field?

[1] https://datatracker.ietf.org/doc/html/rfc3629#page-4

Thanks,
Pierre

Le lun. 4 nov. 2024 à 14:50, 박정화 <vesuv...@toss.im> a écrit :

> hello
>
> I waited because I thought you didn't receive a reply, but I saw that you
> had responded through the following link.
>
> https://lists.apache.org/thread/qgc2ffgwv00htpykvmvxcq4fx7df2qkx
>
>
> <https://lists.apache.org/thread/qgc2ffgwv00htpykvmvxcq4fx7df2qkx>
>
> First of all, I would like to thank you.  I downgraded the library version
> as you said, and the error did not occur.
>
> I am attaching the error data in case it may be of help.
>
> I know the data is in the wrong format, but I was worried because it
> affected the overall data loading, and the advice you sent was very
> helpful.
>
>
> Thank you again.  have a good day.
>
>
> Data Sample
>
>
> {"timestamp":"2024-10-20T16:10:00.322+09:00","data":"[{\"name\":\"\uD83D*\"}]"}
>
>
> PartitionRecord property
> yyyymmddhh : format(toDate( /timestamp, "yyyy-MM-dd'T'HH:mm:ss.SSSXXX",
> "Asia/Seoul"),"yyyyMMddHH")
>
> 2024년 10월 22일 (화) 오후 10:14, 박정화 (Bank) <vesuv...@tossbank.com>님이 작성:
>
> > hello.
> >
> > I am using nifi well.
> >
> > I recently upgraded the version from 1.23.2 to 1.27.0 and encountered the
> > following error in the PartitionRecord process.
> >
> >
> >
> > com.fasterxml.jackson.core.JsonGenerationException: Incomplete surrogate
> > pair: first char 0xD83D, second 0x002A
> >
> >
> >
> > It seems to be a serrogate pair issue, but it seems to have been caused
> by
> > masking some characters on our side.
> >
> > Nevertheless, I wish it were possible to process json like in previous
> > versions or like python.  Is there any way?
> >
> > Could it be that the error occurring differently from the previous
> version
> > is related to the update of the jackson library?
> >
> > The jackson version of nifi 1.23.2 is 2.15.2, and the jackson version of
> > nifi 1.27.0 is 2.17.1, so I thought it might be a problem.
> >
> > If so, we can change the jackson version and compile it, but we will need
> > to continuously change the nifi version in the future.  Could you please
> > give me some advice on this?
> >
> > thank you  have a good day
> >
>
>
> --
> *박정화*
> Data Engineer, Data Platform Team (뱅크)
> 010-8000-6713 | vesuv...@toss.im
> 서울시 강남구 테헤란로 131, 한국지식재산센터 13층 (06133)
> [image: Toss BI]
>

Reply via email to