rohanjain101 opened a new issue, #39938:
URL: https://github.com/apache/arrow/issues/39938
### Describe the bug, including details regarding any error messages,
version, and platform.
```
null_table = pa.Table.from_pydict({"A": ["", None], "B": [None, ""], "C":
["E", "F"]})
partitioning = pa.dataset.FilenamePartitioning(pa.schema({"A": pa.string()}))
pa.parquet.write_to_dataset(null_table, r"dir\filename\",
partitioning=partitioning)
>>> pa.parquet.read_table( r"dir\filename\", partitioning=partitioning)
pyarrow.Table
A: string
----
A: []
>>>
```
Null does not roundtrip when it is one of the partition values, and filename
or directory partitioning are used.
When mixing null and empty string with directory partitioning, it also
doesn't roundtrip:
```
>>> partitioning = pa.dataset.DirectoryPartitioning(pa.schema({"A":
pa.string()}))
>>> pa.parquet.write_to_dataset(null_table, r"dir\directory\",
partitioning=partitioning)
>>> pa.parquet.read_table( r"dir\directory\", partitioning=partitioning)
pyarrow.Table
B: string
C: string
A: string
----
B: [[null],[""]]
C: [["E"],["F"]]
A: [[null],[null]]
>>>
```
I would expect A to be:
`A: [[""],[null]]`
Which is what the original table had. Also, if there are 2 partition
columns, and the first one has a null value, an error is raised, but if there's
only 1 partition column, then no error is raised, which seems inconsistent:
```
>>> partitioning = pa.dataset.DirectoryPartitioning(pa.schema({"A":
pa.string(), "B": pa.string()}))
>>> pa.parquet.write_to_dataset(null_table, r"dir\",
partitioning=partitioning)
pyarrow.lib.ArrowInvalid: No partition key for A but a key was provided
subsequently for B.
>>>
```
What is the expected behavior when null is in a partition column? Is it
expected to work, or should an error always be raised?
### Component(s)
C++, Parquet, Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]