[
https://issues.apache.org/jira/browse/KUDU-1276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16638230#comment-16638230
]
Jordan Birdsell commented on KUDU-1276:
---------------------------------------
Yea, this approach was not the ideal approach, meant to be a quick stab at
getting the functionality in. [~wesmckinn] if you're interested in helping out
here it would be much appreciated, I've not had a lot of time in recent months
for this work.
> Add a vectorized read/write interface for pandas DataFrame objects
> ------------------------------------------------------------------
>
> Key: KUDU-1276
> URL: https://issues.apache.org/jira/browse/KUDU-1276
> Project: Kudu
> Issue Type: New Feature
> Components: client, python
> Reporter: Wes McKinney
> Assignee: Jordan Birdsell
> Priority: Major
>
> A pandas read/write interface would make Kudu significantly easier to use for
> average Python data users.
> The layering is as follows:
> - Writer: "Vectorized" insert that accepts a C/C++ array of values plus an
> array (either bits or bytes) indicating nullness for nullable slots
> - Reader: Converts a row batch to NumPy arrays with missing data
> representation suitable for use in pandas. Ideally should not create more
> than one PyString object for each observed string value. Binary can be
> encoded as UTF8 string, while Timestamp will need to be converted to
> nanoseconds for pandas
> This would also give a very performant and relatively GIL-free data ingest
> path to the Kudu (and Kudu consumers like Impala) without a great deal of
> Python+Cython coding.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)