Paul, I cannot thank you enough for your help and guidance! You are right that columnar readers will have a harder time balancing resource requirements and performance. Nevertheless, DRILL-6147 is a starting point; it should allow us to gain knowledge and accordingly refine our strategy as we go.
FYI - On a completely different topic; I was working on an EBF regarding the Parquet complex reader (though the bug was midstream). I was surprised by the level of overhead associated with nested data processing; literarily, the code was jumping from one column/level to another just to process a single value. There was a comment to perform such processing in a bulk manner (which I agree with). The moral of the story is that Drill is dealing with complex use-cases that haven’t been dealt with before (at least not with great success); as can be seen, we started with simpler solutions only to realize they are inefficient. What is needed, is to spend time understanding such use-cases and incrementally attempt perfecting those shortcomings. Regards, Salim > On Feb 11, 2018, at 3:44 PM, Paul Rogers <par0328@yahoo.c > <mailto:par0328@yahoo.c> > om.INVALID> wrote: > > One more thought: >>> 3) Assuming that you go with the average batch size calculation approach, > > The average batch size approach is a quick and dirty approach for non-leaf > operators that can observe an incoming batch to estimate row width. Because > Drill batches are large, the law of large numbers means that the average of a > large input batch is likely to be a good estimator for the average size of a > large output batch. > Note that this works only because non-leaf operators have an input batch to > sample. Leaf operators (readers) do not have this luxury. Hence the result > set loader uses the actual accumulated size for the current batch. > Also note that the average row method, while handy, is not optimal. It will, > in general, result in greater internal fragmentation than the result set > loader. Why? The result set loader packs vectors right up to the point where > the largest would overflow. The average row method works at the aggregate > level and will likely result in wasted space (internal fragmentation) in the > largest vector. Said another way, with the average row size method, we can > usually pack in a few more rows before the batch actually fills, and so we > end up with batches with lower "density" than the optimal. This is important > when the consuming operator is a buffering one such as sort. > The key reason Padma is using the quick & dirty average row size method is > not that it is ideal (it is not), but rather that it is, in fact, quick. > We do want to move to the result set loader over time so we get improved > memory utilization. And, it is the only way to control row size in readers > such as CSV or JSON in which we have no size information until we read the > data. > - Paul