Absolutely, my intention is to start with scan all read first then add the
scanner, then the writer.
On 14 Oct. 2016 11:16 pm, "Jordan Birdsell" <jordantbirds...@gmail.com>
I also agree with simple, particularly at this stage. I think we could
always go back and do the more efficient vectorized mapping at a later
point. Todd, to your point about it being simple, in my experience, Python
developers take many forms, often times you will find those that really
don't like to code and are just there for the capabilities of tools like
pandas. I think helper functions like this are good.
Back to the approach, I think this is fair for now, pyspark does the same
"less efficient" mapping,
Greg, did you also intend to provide mapping from Pandas -> Kudu? Also, I
would take a look at maybe implementing this at a scanner level too, I
think this could be useful for folks using the Scan Token API.
On Fri, Oct 14, 2016 at 2:12 AM Todd Lipcon <t...@cloudera.com> wrote:
> On Thu, Oct 13, 2016 at 11:01 PM, Greg Kocunik <g.kocu...@gmail.com>
> > Hello,
> > I would like to contribute pandas support in the python API.
> > There is a jira ticket <https://issues.apache.org/jira/browse/KUDU-1276>
> > regarding this however the level is quite technical and beyond my
> > abilities.
> > I would like to get consensus if you are open to simpler solutions in
> > interim.
> > To give you an idea, I was looking at doing something along the lines
> > import pandas as pd
> > scanner = table.scanner()
> > scanner.open()
> > data = scanner.read_all_tuples()
> > pd.DataFrame(data,
> > columns=table.schema.names).set_index(table.schema.primary_keys())
> > Please let me know if such solutions are welcome.
> I'm always in favor of simple, but one question: if it's that simple then
> what's the purpose of having the explicit support, versus asking people to
> write the simple snippet?
> Justin Birdsell probably has a good opinion here since he's way more
> than I am on Python.
> Todd Lipcon
> Software Engineer, Cloudera