Hi:

I would like to know if there is any way in PyArrow to write empty string 
values to a parquet file.
When I use Parquet.write_table, if any column contains empty string values, 
they end up as None in the parquet file.
My process depends on these values to be properly written as empty strings in 
the parquet files.

To provide some context, my current worflow is the following:

- Read content from json files (using Pandas.read_json)
- Convert the corresponding dataframe to a PyArrow table (using 
PyArrow.Table.from_pandas)
- Finally, write the table to a parquet file (using Parquet.write_table)

I have done some checks during the process, and the empty string values are 
being honored until the writing step to a parquet file.

The options for the write_table method don't provide any specific for this, is 
this behavior (write '' as None) an unavoidable default?
Is there any other way to write the parquet files where I have more options to 
deal with this?

Any hint or feedback will be greatly appreciated.

Thanks a lot in advance, all the best.

Sergio Carrascoso

Reply via email to