High level requirements : 1. Write larger files while keeping the ingestion & query latencies low 2. Better data layout, for eg.,when rewriting smaller files to larger ones, piggyback on the I/O and move records around and group them based on some pattern for better query performance, compression etc..
Created an issue around this : https://issues.apache.org/jira/browse/HUDI-112 Let's discuss there and then we can follow it up with a HIP. Thanks, Nishith