rohanjain101 opened a new issue, #39938:
URL: https://github.com/apache/arrow/issues/39938

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   ```
   null_table = pa.Table.from_pydict({"A": ["", None], "B": [None, ""], "C": 
["E", "F"]})
   partitioning = pa.dataset.FilenamePartitioning(pa.schema({"A": pa.string()}))
   pa.parquet.write_to_dataset(null_table, r"dir\filename\", 
partitioning=partitioning)
   >>> pa.parquet.read_table(  r"dir\filename\", partitioning=partitioning)
   pyarrow.Table
   A: string
   ----
   A: []
   >>>
   ```
   
   Null does not roundtrip when it is one of the partition values, and filename 
or directory partitioning are used.
   
   When mixing null and empty string with directory partitioning, it also 
doesn't roundtrip:
   
   ```
   >>> partitioning = pa.dataset.DirectoryPartitioning(pa.schema({"A": 
pa.string()}))
   >>> pa.parquet.write_to_dataset(null_table, r"dir\directory\", 
partitioning=partitioning)
   >>> pa.parquet.read_table( r"dir\directory\", partitioning=partitioning)
   pyarrow.Table
   B: string
   C: string
   A: string
   ----
   B: [[null],[""]]
   C: [["E"],["F"]]
   A: [[null],[null]]
   >>>
   ```
   
   I would expect A to be:
   
   `A: [[""],[null]]`
   
   Which is what the original table had. Also, if there are 2 partition 
columns, and the first one has a null value, an error is raised, but if there's 
only 1 partition column, then no error is raised, which seems inconsistent:
   
   ```
   >>> partitioning = pa.dataset.DirectoryPartitioning(pa.schema({"A": 
pa.string(), "B": pa.string()}))
   >>> pa.parquet.write_to_dataset(null_table, r"dir\", 
partitioning=partitioning)
   pyarrow.lib.ArrowInvalid: No partition key for A but a key was provided 
subsequently for B.
   >>>
   ```
   
   What is the expected behavior when null is in a partition column? Is it 
expected to work, or should an error always be raised?
   
   
   
   ### Component(s)
   
   C++, Parquet, Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to