Xu Yao created KUDU-2437:
----------------------------
Summary: Generate ScanToken from small chunks in tablet
Key: KUDU-2437
URL: https://issues.apache.org/jira/browse/KUDU-2437
Project: Kudu
Issue Type: Improvement
Components: client, master, tablet
Reporter: Xu Yao
When reading data in a kudu table using spark, if there is a large amount of
data in the tablet, reading the data takes a long time.
The reason is that KuduRDD uses a tablet to generate the scanToken, so a spark
task needs to process all the data in a tablet. So:
# Split the Tablet into many small chunks by some primary keys
# Report the primary keys to Master
# Client get the primary keys from Master, and generate the scanToken
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)