Hello Sergio, this is definitely unwanted behaviour. Can you open an issue on https://issues.apache.org/jira/projects/PARQUET and provide a minimal reproducing example. There is definitely a difference between empty strings and null strings. Parquet also supports the differentiation thus we should support roundtripping them.
Uwe On Thu, May 3, 2018, at 8:47 AM, scarrasc...@ravenpack.com wrote: > > Hi: > > I would like to know if there is any way in PyArrow to write empty > string values to a parquet file. > When I use Parquet.write_table, if any column contains empty string > values, they end up as None in the parquet file. > My process depends on these values to be properly written as empty > strings in the parquet files. > > To provide some context, my current worflow is the following: > > - Read content from json files (using Pandas.read_json) > - Convert the corresponding dataframe to a PyArrow table (using > PyArrow.Table.from_pandas) > - Finally, write the table to a parquet file (using Parquet.write_table) > > I have done some checks during the process, and the empty string values > are being honored until the writing step to a parquet file. > > The options for the write_table method don't provide any specific for > this, is this behavior (write '' as None) an unavoidable default? > Is there any other way to write the parquet files where I have more > options to deal with this? > > Any hint or feedback will be greatly appreciated. > > Thanks a lot in advance, all the best. > > Sergio Carrascoso >