Hi everyone! Here are the notes from today's sync-up. Thanks to everyone
that attended!

Topics:

* Sort order for min/max stats

  Parquet MR and Parquet Format sorting: PARQUET-686
  PRs ready to go
  Negative bytes are causing problems in the binary sort.
  Create a new field for unsigned comparison. Should this be considered a
bug?

* Dependency switch arrow->parquet to parquet->arrow

  Sort of like the Java side, add an arrow object model to C++
  Need to decide if Arrow is optional or not
  Python will depend on Parquet, maybe Arrow as well
  Circular dependencies are bad, need to figure out a clean solution

* Compatibility checks between Java and C++

  Brought up by signed byte comparison, C++ uses unsigned comparisons
  We should have test suites to make sure they are readable
  Look at the old parquet-compatibility lib
  Piyush will open a JIRA and start looking at it
  Java is reading files in the repository to make sure they're still correct
  Would create a C++ version of this

* ByteBuffer: setup performance benchmarks and run them regularly

  Ran into issues trying to release 1.9.0 at Twitter
  Release ran 5-15% slower due to encoding/decoding
  We don't have continuous performance tests for each PR, so we rely on devs
  Use parquet-benchmark to test each commit?
  How can we do this as part of the build? Need to ignore false-positives
  We should at least test performance at RCs
  Treat performance regressions like bugs - add benchmarks as we would test
cases

* Parquet 1.9.0 release

  Check whether the min/max fix should be included and merge
  Review PARQUET-623
  Get a RC out today or tomorrow

* Quarterly releases

  Piyush volunteered to be release manager for next quarter

* New Interval types

  Drill wanted the old interval type, but would switch to the new ones
  The old type is basically all of the information for the two new types
  Old type would be deprecated

* New encodings

-- 
Ryan Blue
Software Engineer
Netflix

Reply via email to