Xu Yao created KUDU-2437:
----------------------------

             Summary: Generate ScanToken from small chunks in tablet
                 Key: KUDU-2437
                 URL: https://issues.apache.org/jira/browse/KUDU-2437
             Project: Kudu
          Issue Type: Improvement
          Components: client, master, tablet
            Reporter: Xu Yao


When reading data in a kudu table using spark, if there is a large amount of 
data in the tablet, reading the data takes a long time.

The reason is that KuduRDD uses a tablet to generate the scanToken, so a spark 
task needs to process all the data in a tablet. So:
 # Split the Tablet into many small chunks by some primary keys
 # Report the primary keys to Master
 # Client get the primary keys from Master, and generate the scanToken



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to