rileyhun opened a new issue #12396: URL: https://github.com/apache/arrow/issues/12396
We have some data stored in parquet file format from a `pyspark` pipeline and we are trying to read it in using `pyarrow`. Unfortunately, `pyarrow' is not able to interpret one of the stored data types. Would prefer being able to read in the data without relying on `pyspark`. I am using `pyarrow=7.0` Example: ``` import s3fs import pyarrow.parquet as pq fs = s3fs.S3FileSystem() bucket_uri = 's3://data/batch=1000doc/part=0' dataset = pq.ParquetDataset(bucket_uri, filesystem=fs) table = dataset.read() table.to_pandas() ``` Error: ``` ArrowNotImplementedError: Not implemented type for Arrow list to pandas: map<string, double> ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
