[ 
https://issues.apache.org/jira/browse/KUDU-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15230555#comment-15230555
 ] 

Dan Burkert commented on KUDU-1401:
-----------------------------------

Hi [~mrocklin].  I'm actually working on a new API right now that may solve 
your needs.  There is an associated [design 
doc|https://github.com/danburkert/kudu/blob/scan-api-design/docs/design-docs/scan-tokens.md]
 with the motivation and high level overview.  A Java implementation is 
[already in review|http://gerrit.cloudera.org:8080/#/c/2592/].  The idea is to 
allow clients to create 'scan tokens' in a similar way that a normal scan would 
be created.  These tokens correspond to a contiguous physical portion of the 
table, and may be later turned into a scanner over that portion.  The tokens 
contain locality information, and can be easily serialized/deserialized.  
Hopefully this will suit your needs?  Once the C++ implementation is done 
(working on it now), python bindings can be created.

> Expose partition location information through Python API
> --------------------------------------------------------
>
>                 Key: KUDU-1401
>                 URL: https://issues.apache.org/jira/browse/KUDU-1401
>             Project: Kudu
>          Issue Type: New Feature
>          Components: api, python
>            Reporter: Matthew Rocklin
>            Priority: Minor
>
> When building data local parallel applications it is often nice to know the 
> physical location of blocks of data on the network so that, for each 
> particular block of data we can try to load it into memory on a machine where 
> it already exists on local disk.
> This API exists, I think, in the C++ layer but is not yet exposed, I think, 
> through the Python API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to