[
https://issues.apache.org/jira/browse/ARROW-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
TP Boudreau reassigned ARROW-5618:
----------------------------------
Assignee: TP Boudreau
> Using deprecated Int96 storage for timestamps triggers integer overflow in
> some cases
> -------------------------------------------------------------------------------------
>
> Key: ARROW-5618
> URL: https://issues.apache.org/jira/browse/ARROW-5618
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++
> Reporter: TP Boudreau
> Assignee: TP Boudreau
> Priority: Minor
>
> When storing Arrow timestamps in Parquet files using the Int96 storage
> format, certain combinations of array lengths and validity bitmasks cause an
> integer overflow error on read. It's not immediately clear whether the
> Arrow/Parquet writer is storing zeroes when it should be storing positive
> values or the reader is attempting to calculate a nanoseconds value
> inappropriately from zeroed inputs (perhaps missing the null bit flag). Also
> not immediately clear why only certain length columns seem to be affected.
> Probably the quickest way to reproduce this undefined behavior is to alter
> the existing unit test UseDeprecatedInt96 (in file
> .../arrow/cpp/src/parquet/arrow/arrow-reader-writer-test.cc) by quadrupling
> its column lengths (repeating the same values), followed by 'make unittest'
> using clang-7 with sanitizers enabled. (Here's a patch applicable to current
> master that changes the test as described: [1]; I used the following cmake
> command to build my environment: [2].) You should get a log something like
> [3]. If requested, I'll see if I can put together a stand-alone minimal test
> case that induces the behavior.
> The quick-hack at [4] will prevent integer overflows, but this is only
> included to confirm the proximate cause of the bug: the Julian days field of
> the Int96 appears to be zero, when a strictly positive number is expected.
> I've assigned the issue to myself and I'll start looking into the root cause
> of this.
> [1] https://gist.github.com/tpboudreau/b6610c13cbfede4d6b171da681d1f94e
> [2] https://gist.github.com/tpboudreau/59178ca8cb50a935aab7477805aa32b9
> [3] https://gist.github.com/tpboudreau/0c2d0a18960c1aa04c838fa5c2ac7d2d
> [4] https://gist.github.com/tpboudreau/0993beb5c8c1488028e76fb2ca179b7f
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)