clintropolis commented on pull request #12139:
URL: https://github.com/apache/druid/pull/12139#issuecomment-1009633460


   this seems pretty useful, but also looks rather expensive since this is 
going to happen for every row. Could you measure the performance before and 
after this change? [this benchmark might be a good place to 
start](https://github.com/apache/druid/blob/master/benchmarks/src/test/java/org/apache/druid/benchmark/query/CachingClusteredClientBenchmark.java)
   
   Also, there appears to be no way to disable it, maybe it should be possible 
to set the limit to 0 to disable this computation instead of setting the limit 
to max long value?
   
   Should it prove expensive, maybe the approach should be to just sample the 
first 'n' rows and use whatever the average estimated size is for any remaining 
rows instead of trying to estimate every row encountered. I imagine the loss of 
accuracy would be worth how much cheaper it would be to not have to loop every 
column of every row for all rows.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to