[
https://issues.apache.org/jira/browse/KUDU-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974290#comment-16974290
]
wangningito edited comment on KUDU-1644 at 11/15/19 1:58 AM:
-------------------------------------------------------------
Here I submitted an implementation for token-based scan in case of only one
hash partition which it contains only one key.
[https://gerrit.cloudera.org/c/14706/ |https://gerrit.cloudera.org/c/14706/]
This implementation, in client module, filtered the values to be pushed during
the stage of token building while do very slightly modification of current code
and slightly impact on performance.
In previous pruneHashComponent method, all the hash bucket of rows were
calculated, I simply implemented the idea by collecting those id and replace
the in-list predicate values with filtered values . So this implementation were
done with almost no performance impaction for other case. I implemented it by
place it in client instead of place in tablet while the performance improvement
can be acquired in two aspects, less values for transport in network, and
reduction the complexity of further binary search logarithmically.
Here I attach some performance benchmark with this implementation.
Hardware:
Client: 4 cores, 8g memory
Server: 4 cores, 8g memory
In-List size: 100000, all query happen in cache.
The table to be scan by in-list query contains 10M rows and 30 dense columns,
cells are consist of BIGINT or STRING randomly. 24 partitions.
Before tuning:
!http://doc.sensorsdata.cn/download/attachments/29573518/image2019-11-11_19-11-21.png?version=1&modificationDate=1573470681000&api=v2!
After tuning:
!http://doc.sensorsdata.cn/download/attachments/29573518/image2019-11-12_15-5-57.png?version=1&modificationDate=1573542358000&api=v2!
was (Author: wangning):
Here I submitted an implementation for token-based scan in case of only one
hash partition which it contains only one key.
[https://gerrit.cloudera.org/c/14706/ |https://gerrit.cloudera.org/c/14706/]
This implementation, in client module, filtered the values to be pushed during
the stage of token building while do very slightly modification of current code
and slightly impact on performance.
In previous pruneHashComponent method, all the hash bucket of rows were
calculated, I simply implemented the idea by collecting those id and replace
the in-list predicate values with filtered values . So this implementation were
done with almost no performance impaction for other case. I implemented it by
place it in client instead of place in tablet while the performance improvement
can be acquired in two aspects, less values for transport in network, and
reduction the complexity of further binary search logarithmically.
Here I attach some performance benchmark with this implementation.
Hardware:
Client: 4 cores, 8g memory
Server: 4 cores, 8g memory
In-List size: 100000, all query happen in cache.
The table to be scan by in-list query contains 10M rows and 30 dense columns,
cells are consist of BIGINT or STRING randomly.
Before tuning:
!http://doc.sensorsdata.cn/download/attachments/29573518/image2019-11-11_19-11-21.png?version=1&modificationDate=1573470681000&api=v2!
After tuning:
!http://doc.sensorsdata.cn/download/attachments/29573518/image2019-11-12_15-5-57.png?version=1&modificationDate=1573542358000&api=v2!
> Simplify IN-list predicate values based on tablet partition key or rowset PK
> bounds
> -----------------------------------------------------------------------------------
>
> Key: KUDU-1644
> URL: https://issues.apache.org/jira/browse/KUDU-1644
> Project: Kudu
> Issue Type: Sub-task
> Components: perf, tablet
> Reporter: Dan Burkert
> Priority: Major
>
> When new scans are optimized by the tablet, the tablet's partition key bounds
> aren't taken into account in order to remove predicates from the scan. One
> of the most important such optimizations is that IN-list predicates could
> remove values based on the tablet's constraints.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)