daviewales opened a new issue, #50142:
URL: https://github.com/apache/arrow/issues/50142

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   Suppose we have a pyarrow table with a nanosecond resolution time column:
   
   ``` python
   import pyarrow as pa
   t = pa.table({'A': [40984E9]}, schema=pa.schema({'A': pa.time64('ns')}))
   ```
   
   I want to save this to an older Parquet version, to support older readers 
such as Azure Synapse, which [doesn't support nanosecond time 
resolution](https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/develop-openrowset#type-mapping-for-parquet).
   
   According to the [pyarrow.parquet.write_table 
docs](https://arrow.apache.org/docs/python/generated/pyarrow.parquet.write_table.html),
 if `coerce_timestamps` is not specified:
   
   > defaults are chosen depending on version. For version='1.0' and 
version='2.4', nanoseconds are cast to microseconds (‘us’)
   
   So, I should be able to save this to either version '1.0' or version '2.4', 
and it should automatically case nanoseconds to microseconds:
   
   ## Parquet 2.4
   
   ``` python
   pa.parquet.write_table(t, 'test-pyarrow-2.4.parquet', version='2.4')
   ```
   
   However, inspecting the metadata, we can immediately see that the `version` 
argument has been ignored:
   
   ``` python
   pa.parquet.ParquetFile('test-pyarrow-2.4.parquet').metadata
   # <pyarrow._parquet.FileMetaData object at 0x7fef9110b9c0>
   #   created_by: parquet-cpp-arrow version 24.0.0
   #   num_columns: 1
   #   num_rows: 1
   #   num_row_groups: 1
   #   format_version: 2.6
   #   serialized_size: 383
   ```
   
   Additionally, we can see that the time field has _not_ been coerced to 
microseconds:
   
   ``` python
   pa.parquet.ParquetFile('test-pyarrow-2.4.parquet').schema
   # <pyarrow._parquet.ParquetSchema object at 0x7fefa95ee4c0>
   # required group field_id=-1 schema {
   #   optional int64 field_id=-1 A (Time(isAdjustedToUTC=false, 
timeUnit=nanoseconds));
   # }
   ```
   
   If we add the (optional) argument `coerce_timestamps='us'`, the effect is 
the same:
   
   ``` python
   pa.parquet.write_table(t, 'test-pyarrow-2.4-coerce.parquet', version='2.4', 
coerce_timestamps='us')
   pa.parquet.ParquetFile('test-pyarrow-2.4-coerce.parquet').metadata
   # <pyarrow._parquet.FileMetaData object at 0x7fef07178310>
   #   created_by: parquet-cpp-arrow version 24.0.0
   #   num_columns: 1
   #   num_rows: 1
   #   num_row_groups: 1
   #   format_version: 2.6
   #   serialized_size: 383
   
   pa.parquet.ParquetFile('test-pyarrow-2.4-coerce.parquet').schema
   # <pyarrow._parquet.ParquetSchema object at 0x7fef882d27c0>
   # required group field_id=-1 schema {
   #   optional int64 field_id=-1 A (Time(isAdjustedToUTC=false, 
timeUnit=nanoseconds));
   # }
   ```
   
   ## Parquet 1.0
   
   If we try setting the Parquet version to '1.0', we are similarly unable to 
coerce the timestamps to microseconds:
   
   ``` python
   pa.parquet.write_table(t, 'test-pyarrow-1.0.parquet', version='1.0')
   pa.parquet.ParquetFile('test-pyarrow-1.0.parquet').metadata
   # <pyarrow._parquet.FileMetaData object at 0x7fef0712bce0>
   #   created_by: parquet-cpp-arrow version 24.0.0
   #   num_columns: 1
   #   num_rows: 1
   #   num_row_groups: 1
   #   format_version: 1.0
   #   serialized_size: 382
   
   pa.parquet.ParquetFile('test-pyarrow-1.0.parquet').schema
   # <pyarrow._parquet.ParquetSchema object at 0x7fef0cc69e80>
   # required group field_id=-1 schema {
   #   optional int64 field_id=-1 A (Time(isAdjustedToUTC=false, 
timeUnit=nanoseconds));
   # }
   ```
   
   This time, we at least get the correct Parquet version. However, the time 
field still has nanosecond resolution.
   Similarly, the `coerce_timestamps='us'` argument has the same result.
   
   ### Component(s)
   
   Python, Parquet


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to