jorisvandenbossche commented on issue #38000:
URL: https://github.com/apache/arrow/issues/38000#issuecomment-1746244230

   @IkeNefcy the file you uploaded 
(https://github.com/apache/arrow/issues/38000#issuecomment-1745759195) is 
created with pyarrow 12.0, and is the file that works fine, is that correct? 
   So for that one, it is expected the timestamps are stored as microseconds, 
and this should work almost anywhere. 
   
   As @mapleFU mentioned, with https://github.com/apache/arrow/pull/36137, we 
changed the default to start writing nanoseconds (if your original data is in 
nanoseconds, which is the case when starting from pandas) with pyarrow 13.0. 
   
   I assume that the parquet reader you are using with Spectrum is incorrectly 
reading those files. That's something best reported to them.
   
   You can check the metadata of the Parquet file that was written using 
pyarrow as follows (using this with the file you uploaded):
   
   ```
   In [17]: import pyarrow.parquet as pq
   
   In [18]: meta = pq.read_metadata("Downloads/test")
   
   In [19]: meta
   Out[19]: 
   <pyarrow._parquet.FileMetaData object at 0x7f4546eeae30>
     created_by: parquet-cpp-arrow version 12.0.0
     num_columns: 4
     num_rows: 1
     num_row_groups: 1
     format_version: 2.6
     serialized_size: 2754
   
   In [20]: meta.schema
   Out[20]: 
   <pyarrow._parquet.ParquetSchema object at 0x7f4546e94500>
   required group field_id=-1 schema {
     optional int64 field_id=-1 start_time_local 
(Timestamp(isAdjustedToUTC=false, timeUnit=microseconds, 
is_from_converted_type=false, force_set_converted_type=false));
     optional int64 field_id=-1 end_time_local 
(Timestamp(isAdjustedToUTC=false, timeUnit=microseconds, 
is_from_converted_type=false, force_set_converted_type=false));
     optional int64 field_id=-1 start_time_utc 
(Timestamp(isAdjustedToUTC=false, timeUnit=microseconds, 
is_from_converted_type=false, force_set_converted_type=false));
     optional int64 field_id=-1 end_time_utc (Timestamp(isAdjustedToUTC=false, 
timeUnit=microseconds, is_from_converted_type=false, 
force_set_converted_type=false));
   }
   ``` 
   
   (and so you can see here that this file was created using pyarrow 12.0, and 
created timestamps with microseconds)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to