Thanks Dian for driving this, definitely +1

Here's my 2 cents:

1. I would pay more attention on to_pandas than from_pandas.  Because
to_pandas will be used more frequently I believe
2. I think ArrowTableSink may not be enough for to_pandas, because pandas
dataframe is on client side, it is not a table sink. We still need to
convert ArrowTableSink to pandas dataframe if I understand correctly.




Dian Fu <dian0511...@gmail.com> 于2020年4月1日周三 上午10:49写道:

> Hi everyone,
>
> I'd like to start a discussion about supporting conversion between PyFlink
> Table and Pandas DataFrame.
>
> Pandas dataframe is the de-facto standard to work with tabular data in
> Python community. PyFlink table is Flink’s representation of the tabular
> data in Python language. It would be nice to provide the functionality to
> convert between the PyFlink table and Pandas dataframe in PyFlink Table
> API. It provides users the ability to switch between PyFlink and Pandas
> seamlessly when processing data in Python language without an extra
> intermediate connectors.
>
> Jincheng Sun and I have discussed offline and have drafted the
> FLIP-120[1]. Looking forward to your feedback!
>
> Regards,
> Dian
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-120%3A+Support+conversion+between+PyFlink+Table+and+Pandas+DataFrame



-- 
Best Regards

Jeff Zhang

Reply via email to