ByteBaker opened a new issue, #3430:
URL: https://github.com/apache/arrow-rs/issues/3430

   **Describe the bug**
   While using `parquet::file::reader::SerializedFileReader`, calling 
`to_json_value` on a parquet **Row**, containing a date before **1970/01/01** 
panics.
   
   A similar issue #2897 was reported earlier and closed after #2899 solved it. 
But it only handled `Field::TimestampMillis` and `Field::TimestampMicros` 
types, `Field::Date` still breaks.
   
   **To Reproduce**
   <!--
   Steps to reproduce the behavior:
   -->
   Inside test `parquet::record::api::tests::test_convert_date_to_string`, add 
an extra call to function `check_date_conversion` with any date before 
**1970/01/01**. Another way is to use a parquet file with at least one `Row` 
containing such value.
   
   This panics with the message
   ```
   thread 'record::api::tests::test_convert_date_to_string' panicked at 'No 
such local time'
   ```
   
   **Expected behavior**
   <!--
   A clear and concise description of what you expected to happen.
   -->
   Be able to call `Row::to_json_value` without panic.
   
   ---
   **Additional context**
   <!--
   Add any other context about the problem here.
   -->
   Why #2899 works:
   `enum Field` defines `Date, TimestampMillis, TimestampMicros` fields as:
   ```
   pub enum Field {
      ...
      Date(u32),
      TimestampMillis(u64),
      TimestampMicros(u64),
      ...
   }
   ```
   When `i32` or `i64` are interpreted as their unsigned counterparts, the 
value underflows due to the now lack of a sign bit, e.g., `-7766_i32` becomes 
`4294959530_u32`.
   
   When the new value is interpreted as its signed variant again, we get the 
original value (since the bytes never changed). Therefore 
`assert_eq!(4294959530_u32 as i32, -7766_i32)` succeeds.
   
   #2899 utilizes this fact to solve the problem. But the same code doesn't 
resolve it for `Field::Date` in the below snippet, because it holds a `u32`, 
and promoting it an `i64` changes the value itself (since it treats the MSB of 
the `u32` for value, and not sign, which was crucial). Below code doesn't work.
   
   ```
   fn convert_date_to_string(value: u32) -> String {
       static NUM_SECONDS_IN_DAY: i64 = 60 * 60 * 24;
       let dt = Utc
           .timestamp_opt(value as i64 * NUM_SECONDS_IN_DAY, 0)
           .unwrap();
       format!("{}", dt.format("%Y-%m-%d %:z"))
   }
   ```
   The correct way would be to read `value` to `i32` before promoting it to 
i64. Therefore below code works.
   ```
   fn convert_date_to_string(value: u32) -> String {
       static NUM_SECONDS_IN_DAY: i64 = 60 * 60 * 24;
       let dt = Utc
           .timestamp_opt(value as i32 as i64 * NUM_SECONDS_IN_DAY, 0)
           .unwrap();
       format!("{}", dt.format("%Y-%m-%d %:z"))
   }
   ```
   
   Though my personal opinion is to change the type definition of `Field` and 
use the signed integer types for `Date`, `TimestampMillis` and 
`TimestampMicros`, because we can always have dates/times before Unix epoch. 
Thus, negative timestamps are required.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to