Re: [I] [C++][Python] Add default data type for completely null arrays when field is not populated in pyarrow.csv.ConvertOptions [arrow]

via GitHub Sat, 13 Sep 2025 02:48:32 -0700


MurrayData commented on issue #47314:
URL: https://github.com/apache/arrow/issues/47314#issuecomment-3287959166


   > If I understand correctly you are suggesting empty columns that would be 
created as Null Type to be stored as empty strings?
   
   Not necessarily, I'm suggesting user defined. We regularly receive some 
government statistical datasets, with variable schemas (we can handle this) 
where fields are inconsistently populated, indicating the field wasn't captured 
on some occasions. I'd like to be able to set a null datatype to int in this 
case, as they are counts or, in another case, strings so the schema is 
consistent. We have a workaround where we read as a dataset, get the schema, 
then fix them ourselves, but it would be a useful option to avoid having to do 
this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [I] [C++][Python] Add default data type for completely null arrays when field is not populated in pyarrow.csv.ConvertOptions [arrow]

Reply via email to