[
https://issues.apache.org/jira/browse/FLINK-14020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dian Fu updated FLINK-14020:
----------------------------
Parent: FLINK-14500
Issue Type: Sub-task (was: Task)
> User Apache Arrow as the serializer for data transmission between Java
> operator and Python harness
> --------------------------------------------------------------------------------------------------
>
> Key: FLINK-14020
> URL: https://issues.apache.org/jira/browse/FLINK-14020
> Project: Flink
> Issue Type: Sub-task
> Components: API / Python
> Reporter: Dian Fu
> Assignee: Dian Fu
> Priority: Major
> Fix For: 1.11.0
>
>
> Apache Arrow is "a cross-language development platform for in-memory data. It
> specifies a standardized language-independent columnar memory format for flat
> and hierarchical data, organized for efficient analytic operations on modern
> hardware". It has been widely used in many notable projects, such as Spark,
> Parquet, Pandas, etc.
> We should firstly benchmark whether it could improve the performance a lot
> for non-vectorized Python UDFs. If we see significant performance
> improvements, it would be great to use it for the Java/Python communication.
> Otherwise, record by record serializer will be used.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)