dhatchayani opened a new pull request #3126: [WIP][CARBONDATA-3293] Prune 
datamaps improvement
URL: https://github.com/apache/carbondata/pull/3126
 
 
   **Problem:**
   
   (1) Currently for count (*) , the prune is same as select * query.  Blocklet 
and ExtendedBlocklet are formed from the DataMapRow and that is of no need and 
it is a time consuming process.
   
   (2) Pruning in select * query consumes time in convertToSafeRow() - 
converting the DataMapRow to safe as in an unsafe row to get the position of 
data, we need to traverse through the whole row to reach a position.
   
   (3) In case of filter queries, even if the blocklet is valid or invalid, we 
are converting the DataMapRow to safeRow. This conversion is time consuming 
increasing the number of blocklets.
   
    
   
   **Solution:**
   
   (1) We have the blocklet row count in the DataMapRow itself, so it is just 
enough to read the count. With this count (*) query performance can be improved.
   
   (2) Maintain the data length also to the DataMapRow, so that traversing the 
whole row can be avoided. With the length we can directly hit the data position.
   
   (3) Read only the MinMax from the DataMapRow, decide whether scan is 
required on that blocklet, if required only then it can be converted to 
safeRow, if needed.
   
    - [ ] Any interfaces changed?
    
    - [ ] Any backward compatibility impacted?
    
    - [ ] Document update required?
   
    - [x] Testing done
           Existing UT
          
    - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to