[
https://issues.apache.org/jira/browse/KUDU-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16750692#comment-16750692
]
Grant Henke commented on KUDU-2670:
-----------------------------------
I think the first step to implement this is expose the work in KUDU-2437 via
client APIs. That could be in its own patch as a part of this jira.
I then think #2 you listed is the most widely beneficial and should be
straightforward to implement if the client APIs exist.
I am not sure I fully understand the approach for #1 above. I understand you
want to lookup a single row without the key. However, I am not sure sending a
ton of concurrent requests to Kudu is a good idea. It could result in the rpc
queue filling up with a spike of new requests. That said I am not sure I have a
better answer off of the top of my head. I will think about this though.
> Splitting more tasks for spark job, and add more concurrent for scan operation
> ------------------------------------------------------------------------------
>
> Key: KUDU-2670
> URL: https://issues.apache.org/jira/browse/KUDU-2670
> Project: Kudu
> Issue Type: Improvement
> Components: java, spark
> Affects Versions: 1.8.0
> Reporter: yangz
> Priority: Major
> Labels: performance
>
> Refer to the KUDU-2437 Split a tablet into primary key ranges by size.
> We need a java client implementation to support the split the tablet scan
> operation.
> We suggest two new implementation for the java client.
> # A ConcurrentKuduScanner to get more scanner read data at the same time.
> This will be useful for one case. We scanner only one row, but the predicate
> doesn't contain the primary key, for this case, we will send a lot scanner
> request but only one row return.It will be slow to send so much scanner
> request one by one. So we need a concurrent way. And by this case we test,
> for a 10G tablet, it will save a lot time for one machine.
> # A way to split more spark task. To do so, we need get scanner tokens for
> two step, first we send to the tserver to give range, then with this range we
> get more scanner tokens. For our usage we make a tablet 10G, but we split a
> task to process only 1G data. So we get better performance.
> And all this feature has run well for us for half a year. We hope this
> feature will be useful for the community.
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)