Hi everyone! Here are the notes from today's sync-up. Thanks to everyone that attended!
Topics: * Sort order for min/max stats Parquet MR and Parquet Format sorting: PARQUET-686 PRs ready to go Negative bytes are causing problems in the binary sort. Create a new field for unsigned comparison. Should this be considered a bug? * Dependency switch arrow->parquet to parquet->arrow Sort of like the Java side, add an arrow object model to C++ Need to decide if Arrow is optional or not Python will depend on Parquet, maybe Arrow as well Circular dependencies are bad, need to figure out a clean solution * Compatibility checks between Java and C++ Brought up by signed byte comparison, C++ uses unsigned comparisons We should have test suites to make sure they are readable Look at the old parquet-compatibility lib Piyush will open a JIRA and start looking at it Java is reading files in the repository to make sure they're still correct Would create a C++ version of this * ByteBuffer: setup performance benchmarks and run them regularly Ran into issues trying to release 1.9.0 at Twitter Release ran 5-15% slower due to encoding/decoding We don't have continuous performance tests for each PR, so we rely on devs Use parquet-benchmark to test each commit? How can we do this as part of the build? Need to ignore false-positives We should at least test performance at RCs Treat performance regressions like bugs - add benchmarks as we would test cases * Parquet 1.9.0 release Check whether the min/max fix should be included and merge Review PARQUET-623 Get a RC out today or tomorrow * Quarterly releases Piyush volunteered to be release manager for next quarter * New Interval types Drill wanted the old interval type, but would switch to the new ones The old type is basically all of the information for the two new types Old type would be deprecated * New encodings -- Ryan Blue Software Engineer Netflix
