> I have concerns about the Rust implementation which seems to hard-code > writing out the new sort order. It would be good to understand if this > would break old readers (i.e. how do they handle unknown sort order at the > thrift parsing stage). I have less of a concern if like Java this is an > opt-in feature for now.
Well, it is just a proof of concept ;-) More seriously, the only generated code that I know of that would break is the Rust version (which is one of the reasons we chose to roll our own thrift parser in parquet-rs). I've looked at the Java, C++, and Python generated thrift code and they all seem to tolerate unknown union fields (as they should, a union is just a struct after all). I'm not sure what polars uses, so that might be an issue. My original PoC for the earlier PR (#221 [1]) was opt-in, but by the time I took over the implementation for the current PR (#514 [2]) a deliberate decision was made to only use the new ColumnOrder for floats. The reasoning being making it configurable added complexity to an already complex implementation, and continuing to write files that cannot support pushdown makes little sense (so long as existing implementations can simply ignore the new ordering). If it does turn out that the change isn't as forward compatible as we think, we can revisit before releasing it (which realistically won't be until August at the earliest). We do have a file in parquet-testing [3] (thanks Gang!) that can be used to check compatibility. Cheers, Ed [1] https://github.com/apache/parquet-format/pull/221 [2] https://github.com/apache/parquet-format/pull/514 [3] https://github.com/apache/parquet-testing/pull/104
