Thanks Ravi for the feedback. I completely agree with you that we need to develop the second solution ASAP. Please find my response below for your queries.
1. what if the query comes on noncached columns, will it start read from disk in driver side for minmax ? - If query is on a non-cached column then all the blocks will be selected and min/max pruning will be done in each executor. In driver side there will not be any read as it is a single process and it will increase the pruning time if for every query min/max values are read from disk. So I feel it is better to read in distributed way using the executors. 2. Are we planning to cache blocklet level information or block level information in driver side for cached columns? - We will provide an option to user to cache at Block or Blocklet level. It will be configurable at table level and default caching will be at Block level. I will cover this part in detail in the design document. 3. What is the impact if we automatically chose cached columns from the user query instead of letting the user configure them? - Every query can have different filter columns. So if we choose automatically then for every different column it will read from disk and load into cache. This can be more cumbersome and query time can vary unexpectedly which may not be justifiable. So I feel it is better to let user to decide which columns to be cached. Let me know for any more clarifications. Regards Manish Gupta -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/