[
https://issues.apache.org/jira/browse/KUDU-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15830729#comment-15830729
]
Todd Lipcon commented on KUDU-1713:
-----------------------------------
Hey Matt. Curious to get to the next level of detail here... A couple questions:
- given a row, it's not always a synchronous process to know which tablet it's
in (the client may have a cold cache). Do you think it is better t try to
"prime the cache" up front and eagerly fetch all the locations, such that the
later queries are all able to be synchronous?
- it's relatively straight-forward to return a tablet ID (or an equivalent
pointer or opaque int identifier or whatever) but the tserver/hostname is a bit
trickier since it may change over time as a leader election may occur. Again do
you think a "snapshot at the start" is the right approach? If so, we also need
to worry about the case where different backends snapshot at slightly different
times and send batches to different locations (may or may not be a problem).
Seems like it may be worth writing a design doc on this before going too far
down the implementation
> Client API to indicate the target tablet server for inserts
> -----------------------------------------------------------
>
> Key: KUDU-1713
> URL: https://issues.apache.org/jira/browse/KUDU-1713
> Project: Kudu
> Issue Type: Bug
> Components: client
> Affects Versions: 1.0.0
> Reporter: Matthew Jacobs
> Labels: impala
>
> Impala (and presumably other engines) can more efficiently insert if its own
> data shuffling can place data on the same nodes that the destination tablet
> lives on, to avoid another network transfer from the client. Ideally the
> client API can take a row and return the destination tablet/tserver.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)