[ 
https://issues.apache.org/jira/browse/KUDU-2437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Yao updated KUDU-2437:
-------------------------
    Description:     (was: When reading data in a kudu table using spark, if 
there is a large amount of data in the tablet, reading the data takes a long 
time.

The reason is that KuduRDD uses a tablet to generate the scanToken, so a spark 
task needs to process all the data in a tablet. So:
 # TS report the DRS bounds info to Master
 # Client get the bounds info from Master
 # Client generate the scanToken by bounds info of tablet(set 
LowerBoundPrimaryKey and UpperBoundPrimaryKey))

> Split a tablet into some chunks by size
> ---------------------------------------
>
>                 Key: KUDU-2437
>                 URL: https://issues.apache.org/jira/browse/KUDU-2437
>             Project: Kudu
>          Issue Type: Improvement
>          Components: client, master, tablet
>            Reporter: Xu Yao
>            Assignee: Xu Yao
>            Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to