Re: [I] [Python] Accessing parquet files with parquet.read_table in google cloud storage fails, but works with dataset, works in 16.1.0 fails in 17.0.0 [arrow]

via GitHub Thu, 05 Sep 2024 10:21:43 -0700


brokenjacobs commented on issue #43574:
URL: https://github.com/apache/arrow/issues/43574#issuecomment-2332261083


   > Can you share the schema of the file here? 
`pa.parquet.read_schema('gs://****/v1/li191r/ms=2023-01/source_id=9319/li191r_9319_2023-01-02.parquet')`
 should be enough.
   ```
   source_id: string
   site_id: string
   readout_time: timestamp[ms, tz=UTC]
   voltage: float
   kafka_key: string
   kakfa_ts_type: uint8
   kafka_ts: timestamp[ms]
   kafka_partition: uint8
   kafka_offset: uint64
   kafka_topic: string
   ds: string
   -- schema metadata --
   pandas: '{"index_columns": [], "column_indexes": [], "columns": [{"name":' + 
1502
   
   ```
   
   I've also confirmed this bug on local filesystem as well as via cloud 
storage. And a good workaround is to pass `partitioning=none` to the read_table 
call. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [Python] Accessing parquet files with parquet.read_table in google cloud storage fails, but works with dataset, works in 16.1.0 fails in 17.0.0 [arrow]

Reply via email to