[GitHub] [arrow] yan-hic commented on issue #11967: Parquet schema / data type for entire null object DataFrame columns

via GitHub Sun, 18 Jun 2023 10:52:17 -0700


yan-hic commented on issue #11967:
URL: https://github.com/apache/arrow/issues/11967#issuecomment-1596221798


   > still needs to choose some physical type for the column in the Parquet 
file. And by default, Arrow uses INT32 for the physical type.
   @jorisvandenbossche can that default be changed in `pyarrow` ? I want to use 
STRING instead.
   The issue we are facing is that when saving a dataset, i.e. multiple parquet 
files, with one file having a column coincidentally with all nulls whereas 
other files have some strings for that column. The result is different schema 
across the dataset (INT32 or STRING), which when reading with bigquery (for 
instance), raises errors.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] yan-hic commented on issue #11967: Parquet schema / data type for entire null object DataFrame columns

Reply via email to