Hi, PARQUET-1337 describes the problem of ending up with a drastically different (and worse) row group layout than intended under certain circumstances.
A few weeks ago I started tweaking the logic that controls this in a test-driven fashion. I have found that fixing one problem repeatedly leads to the discovery of another one. After playing this whack-a-mole for a while, I ended up with a much more fundamental change than I originally intended with still room (and need) for improvement. Due to the potential impact of these changes, I have put together a design doc that describes all the problems I could identify and some possible fixes for them: https://docs.google.com/document/d/1FJAVwzszZGkxZa8FtKtSbgBKm7qkS4cXuNW8hl4YKwU/edit# If you are interested, please review and comment on the document. Thanks, Zoltan
