[
https://issues.apache.org/jira/browse/CARBONDATA-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ravindra Pesala resolved CARBONDATA-3293.
-----------------------------------------
Resolution: Fixed
Fix Version/s: 1.5.3
> Prune datamaps improvement for count(*)
> ---------------------------------------
>
> Key: CARBONDATA-3293
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3293
> Project: CarbonData
> Issue Type: Improvement
> Reporter: dhatchayani
> Assignee: dhatchayani
> Priority: Major
> Fix For: 1.5.3
>
> Time Spent: 14h 50m
> Remaining Estimate: 0h
>
> +*Problem:*+
> (1) Currently for count ( *) , the prune is same as select * query. Blocklet
> and ExtendedBlocklet are formed from the DataMapRow and that is of no need
> and it is a time consuming process.
> (2) Pruning in select * query consumes time in convertToSafeRow() -
> converting the DataMapRow to safe as in an unsafe row to get the position of
> data, we need to traverse through the whole row to reach a position.
> (3) In case of filter queries, even if the blocklet is valid or invalid, we
> are converting the DataMapRow to safeRow. This conversion is time consuming
> increasing the number of blocklets.
>
> +*Solution:*+
> (1) We have the blocklet row count in the DataMapRow itself, so it is just
> enough to read the count. With this count ( *) query performance can be
> improved.
> (2) Maintain the data length also to the DataMapRow, so that traversing the
> whole row can be avoided. With the length we can directly hit the data
> position.
> (3) Read only the MinMax from the DataMapRow, decide whether scan is required
> on that blocklet, if required only then it can be converted to safeRow, if
> needed.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)