[
https://issues.apache.org/jira/browse/KUDU-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15230555#comment-15230555
]
Dan Burkert commented on KUDU-1401:
-----------------------------------
Hi [~mrocklin]. I'm actually working on a new API right now that may solve
your needs. There is an associated [design
doc|https://github.com/danburkert/kudu/blob/scan-api-design/docs/design-docs/scan-tokens.md]
with the motivation and high level overview. A Java implementation is
[already in review|http://gerrit.cloudera.org:8080/#/c/2592/]. The idea is to
allow clients to create 'scan tokens' in a similar way that a normal scan would
be created. These tokens correspond to a contiguous physical portion of the
table, and may be later turned into a scanner over that portion. The tokens
contain locality information, and can be easily serialized/deserialized.
Hopefully this will suit your needs? Once the C++ implementation is done
(working on it now), python bindings can be created.
> Expose partition location information through Python API
> --------------------------------------------------------
>
> Key: KUDU-1401
> URL: https://issues.apache.org/jira/browse/KUDU-1401
> Project: Kudu
> Issue Type: New Feature
> Components: api, python
> Reporter: Matthew Rocklin
> Priority: Minor
>
> When building data local parallel applications it is often nice to know the
> physical location of blocks of data on the network so that, for each
> particular block of data we can try to load it into memory on a machine where
> it already exists on local disk.
> This API exists, I think, in the C++ layer but is not yet exposed, I think,
> through the Python API.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)