[GitHub] [arrow-rs] xianwill opened a new issue #653: JSON decoder data corruption for large i64/u64


xianwill opened a new issue #653:
URL: https://github.com/apache/arrow-rs/issues/653

**Describe the bug**
Large values for `i64` and `u64` types are corrupted by the cast to `f64`
back to `i64/u64` in the [json decoder build_primitive_array
method](https://github.com/apache/arrow-rs/blob/b38a4b6c29ba8b9be02460183c61de86bd9ba7df/arrow/src/json/reader.rs#L930-L931).

**To Reproduce**
Pass a large `i64` value through the decoder as demonstrated in [this
commit](https://github.com/apache/arrow-rs/pull/652/commits/405683aa2b30e112c9851b7588b03d0a9d3421a8).
The converted value will be slightly smaller. In this example to create a
breaking test, I passed `1627668684594000000` and the resulting value came out
as `1627668684593999872` - a difference of `128`.

**Expected behavior**
The converted value should match the value passed to the decoder. In this
case, the value in the created record batch should be `1627668684594000000`.

**Additional context**
I found this bug while implementing timestamp support in
[kafka-delta-ingest](https://github.com/delta-io/kafka-delta-ingest/pull/44)
and [delta-rs](https://github.com/delta-io/delta-rs/pull/340). Valid nanosecond
timestamps are on the critical path for us there. Also, I have [an arrow-rs
PR](https://github.com/apache/arrow-rs/pull/652/commits/405683aa2b30e112c9851b7588b03d0a9d3421a8)
in place already to fix.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to