VenuReddy2103 opened a new pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110


    ### Why is this PR needed?
   At present, secondary indexes are leveraged for query pruning via spark plan 
modification. This approach is tightly coupled with spark because the plan 
modification is specific to spark engine. In order to use secondary indexes for 
Presto or Hive queries, it is not feasible to modify the query plans as we 
desire in the current approach. Thus need arises for an engine agnostic 
approach to use secondary indexes in query pruning.
    
    ### What changes were proposed in this PR?
   1. Added Secondary Index datamap as a coarse grain datamap
   2. Secondary Index datamap prune fires the spark sql query on the identified 
secondary index table within the particular segment to get the position 
references for the matching filters of datamap and in turn forms the blocklets. 
Note: Spark sql query is fired to take the advantage of spark distributed 
computing to filter and read the secondary index table in the distributed 
manner. Since the secondary index datamap fires the spark sql, it is 
prerequisite to enable distributed pruning and the Index Server must be up and 
running.
   3. Have added a CarbonInputFormat level property to control the use of newly 
added secondary index datamap or not in query pruning. This property is set 
only when query is triggered from Presto. So, Secondary index datamap is used 
only for Presto queries. And queries from spark continue to use the existing 
approach of plan modification at optimizer/execution phases.
   4. Upon Index Server get splits, if secondary index prune is applicable, 
prune and get extended blocklets directly on the index server driver instead of 
using existing DistributedPruneRDD which prunes on index server executors. This 
is because secondary index datamap pruning essentially fires a spark sql query 
and it require spark session/context. 
   
    ### Does this PR introduce any user interface change?
    - No
    - Yes. (please explain the change and update document)
   
    ### Is any new testcase added?
    - No
    - Yes
   
       
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to