Re: Simple solution for adding Pandas support in the Python API

Greg Kocunik Fri, 14 Oct 2016 09:36:41 -0700

Absolutely, my intention is to start with scan all read first then add the
scanner, then the writer.

On 14 Oct. 2016 11:16 pm, "Jordan Birdsell" <[email protected]>
wrote:

I also agree with simple, particularly at this stage. I think we could
always go back and do the more efficient vectorized mapping at a later
point.  Todd, to your point about it being simple, in my experience, Python
developers take many forms, often times you will find those that really
don't like to code and are just there for the capabilities of tools like
pandas. I think helper functions like this are good.

Back to the approach, I think this is fair for now, pyspark does the same
"less efficient" mapping,
https://github.com/apache/spark/blob/master/python/
pyspark/sql/dataframe.py#L1471
.

Greg, did you also intend to provide mapping from Pandas -> Kudu?  Also, I
would take a look at maybe implementing this at a scanner level too, I
think this could be useful for folks using the Scan Token API.

Jordan

On Fri, Oct 14, 2016 at 2:12 AM Todd Lipcon <[email protected]> wrote:

> On Thu, Oct 13, 2016 at 11:01 PM, Greg Kocunik <[email protected]>
> wrote:
>
> > Hello,
> >
> > I would like to contribute pandas support in the python API.
> >
> > There is a jira ticket <https://issues.apache.org/jira/browse/KUDU-1276>
> > regarding this however the level is quite technical and beyond my
current
> > abilities.
> >
> > I would like to get consensus if you are open to simpler solutions in
the
> > interim.
> > To give you an idea, I was looking at doing something along the lines
of:
> >
> > import pandas as pd
> >
> > scanner = table.scanner()
> > scanner.open()
> > data = scanner.read_all_tuples()
> > pd.DataFrame(data,
> > columns=table.schema.names).set_index(table.schema.primary_keys())
> >
> > Please let me know if such solutions are welcome.
> >
>
> I'm always in favor of simple, but one question: if it's that simple then
> what's the purpose of having the explicit support, versus asking people to
> write the simple snippet?
>
> Justin Birdsell probably has a good opinion here since he's way more
active
> than I am on Python.
>
> -Todd
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: Simple solution for adding Pandas support in the Python API

Reply via email to