Hi, I've found the following odd behaviour when round-tripping data via parquet using pyarrow, when the data contains dictionary arrays with duplicate values.
```python import pyarrow as pa import pyarrow.parquet as pq my_table = pa.Table.from_batches( [ pa.RecordBatch.from_arrays( [ pa.array([0, 1, 2, 3, 4]), pa.DictionaryArray.from_arrays( pa.array([0, 1, 2, 3, 4]), pa.array(['a', 'd', 'c', 'd', 'e']) ) ], names=['foo', 'bar'] ) ] ) my_table.validate(full=True) pq.write_table(my_table, "foo.parquet") read_table = pq.ParquetFile("foo.parquet").read() read_table.validate(full=True) print(my_table.column(1).to_pylist()) print(read_table.column(1).to_pylist()) assert my_table.column(1).to_pylist() == read_table.column(1).to_pylist() ``` Both tables pass full validation, yet the last three lines print: ``` ['a', 'd', 'c', 'd', 'e'] ['a', 'd', 'c', 'e', 'a'] Traceback (most recent call last): File "/home/ataylor/projects/dsg-python-dtcc-equity-kinetics/dsg/example.py", line 29, in <module> assert my_table.column(1).to_pylist() == read_table.column(1).to_pylist() AssertionError ``` Which clearly doesn't look right! My question is whether I'm fundamentally breaking some assumption that dictionary values are unique or if there's a bug in the parquet-arrow conversion? Thanks, Al