[jira] [Updated] (FLINK-14020) User Apache Arrow as the serializer for data transmission between Java operator and Python harness

Dian Fu (Jira) Wed, 23 Oct 2019 00:14:14 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-14020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Dian Fu updated FLINK-14020:
----------------------------
        Parent: FLINK-14500
    Issue Type: Sub-task  (was: Task)

> User Apache Arrow as the serializer for data transmission between Java 
> operator and Python harness
> --------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-14020
>                 URL: https://issues.apache.org/jira/browse/FLINK-14020
>             Project: Flink
>          Issue Type: Sub-task
>          Components: API / Python
>            Reporter: Dian Fu
>            Assignee: Dian Fu
>            Priority: Major
>             Fix For: 1.11.0
>
>
> Apache Arrow is "a cross-language development platform for in-memory data. It 
> specifies a standardized language-independent columnar memory format for flat 
> and hierarchical data, organized for efficient analytic operations on modern 
> hardware". It has been widely used in many notable projects, such as Spark, 
> Parquet, Pandas, etc. 
> We should firstly benchmark whether it could improve the performance a lot 
> for non-vectorized Python UDFs. If we see significant performance 
> improvements, it would be great to use it for the Java/Python communication. 
> Otherwise, record by record serializer will be used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-14020) User Apache Arrow as the serializer for data transmission between Java operator and Python harness

Reply via email to