Thanks Ravi for the feedback. I completely agree with you that we need to
develop the second solution ASAP. Please find my response below for your
queries.

1. what if the query comes on noncached columns, will it start read from 
disk in driver side for minmax ? 
- If query is on a non-cached column then all the blocks will be selected
and min/max pruning will be done in each executor. In driver side there will
not be any read as it is a single process and it will increase the pruning
time if for every query min/max values are read from disk. So I feel it is
better to read in distributed way using the executors.

2. Are we planning to cache blocklet level information or block level 
information in driver side for cached columns? 
- We will provide an option to user to cache at Block or Blocklet level. It
will be configurable at table level and default caching will be at Block
level. I will cover this part in detail in the design document.

3. What is the impact if we automatically chose cached columns from the 
user query instead of letting the user configure them? 
- Every query can have different filter columns. So if we choose
automatically then for every different column it will read from disk and
load into cache. This can be more cumbersome and query time can vary
unexpectedly which may not be justifiable. So I feel it is better to let
user to decide which columns to be cached.

Let me know for any more clarifications.

Regards
Manish Gupta



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Reply via email to