IkeNefcy opened a new issue, #38000:
URL: https://github.com/apache/arrow/issues/38000

   ### Describe the usage question you have. Please include as many useful 
details as  possible.
   
   
   Steps to reproduce: 
   
   1.  Run a python script that will write timestamps to a parquet via s3. 
   2. Due to possible spectrum scan errors, each column should have the type 
defined. This means using df.['col1'] =  pd.datetime(df.['col1']) 
   3. Once all types are assigned, Upload to s3, this can be via boto3 or by 
hand, it doesn't matter which. 
   4. From s3, perform a redshift copy (spectrum operation) (Table would be 
pre-created with a timestamp type column etc). 
   example: 
   ```
   copy
       plex_cada.test_lagrange_periods
   from
       's3://bucket/test/test'
   iam_role 'Spectrum_Role_neded'
   FORMAT AS PARQUET;
   ```
   5. Query the data, the timestamps appear corrupted. 
   
   From a deep dive it appears that an integer longer than the usual Unix 
timestamp is being written and this is causing the date to hundreds of 
thousands of years in the future. 
   For example a usual timestamp might look like 2023-09-11 10:39:00.000000
   But the queried timestamp is 161065-10-07 20:24:18.134352 for the same 
uploaded timestamp. 
   
   This doesn't happen in Athena if you crawl the data and query via glue. 
Although, the crawler will be tricked as well and assume the type is "bigint".
   In this case, for example
   A normal timestamp if I change the schema to show that the bigint is really 
a time stamp, I can get 2023-10-02 00:00:00.000
   The same number without this change, is 1696204800000000000, this looks 
similar to a unix timestamp, but with nano seconds. 
   
   It could be that nano seconds are part of the latest update and that aws 
redshift_spectrum is just not tracking on this change. 
   It would be nice to know if this is an intended change or not though, just 
opening this to confirm if this is expected behavior. 
   It's not super clear at a glance but I am seeing websites updated to 
recognize nano seconds. I'm wondering also if this is related to the 
epochalypse.
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to