Hi All, Over the last six months I've been slowly trying to get the "result set loader" work committed to Drill. As a recap, this was supposed to provide a uniform way to optimally pack a record batch up to a proscribed memory limit. This technique is particularly useful in readers which do not have much information about incoming data sizes.
In the mean time, the team has done a great job using the "sizer" approach to get a good-enough solution for all internal operators. The sizer simply uses statistics about incoming batches to predict outgoing batch size. At the same time, work has been done to create a one-off solution for Parquet. Since Parquet is, by far, Drill's most important data source, this means we have the reader problem is solved for the most critical use case. A time goes on, I get less and less time to maintain the result set loader code. My knowledge of team priorities and of Drill code drifts out of date. So, the question for the group is, is the result set loader work still needed? If not, we can wait to do the remaining commits until a compelling need presents itself. If it is needed, it would be good to know how the team plans to use it so that we stay in sync. Thanks, - Paul
