yan-hic commented on issue #11967: URL: https://github.com/apache/arrow/issues/11967#issuecomment-1596221798
> still needs to choose some physical type for the column in the Parquet file. And by default, Arrow uses INT32 for the physical type. @jorisvandenbossche can that default be changed in `pyarrow` ? I want to use STRING instead. The issue we are facing is that when saving a dataset, i.e. multiple parquet files, with one file having a column coincidentally with all nulls whereas other files have some strings for that column. The result is different schema across the dataset (INT32 or STRING), which when reading with bigquery (for instance), raises errors. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
