[
https://issues.apache.org/jira/browse/FLINK-17062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17079084#comment-17079084
]
Dian Fu edited comment on FLINK-17062 at 4/9/20, 9:04 AM:
----------------------------------------------------------
[~f.pompermaier] Thanks a lot for the suggestions!
The conversion here means the conversion between the Java data types and Python
data types, not means the conversion between Java objects and Python objects.
This is needed because:
- Python type to Java type: the result type of Python UDF is needed to be
converted to Java data type to make sure that it could fit into the existing
type system of the table module, e.g. the type inference, etc.
- Java type to Python type: it's currently only used to retrieve the schema of
a Table (via Table.get_schema().get_field_data_types()). For example, users may
check the schema of a table.
Regarding to the Python/Java object conversion, you are right and it has
already used Arrow as the data exchange format between the Java process and
Python process for [vectorized Python
UDF|https://cwiki.apache.org/confluence/display/FLINK/FLIP-97%3A+Support+Scalar+Vectorized+Python+UDF+in+PyFlink](which
takes pandas.Series as the input and output).
was (Author: dian.fu):
[~f.pompermaier] Thanks a lot for the suggestions!
The conversion here means the conversion between the Java data types and Python
data types, not means the conversion between Java objects and Python objects.
This is needed because:
- Python type to Java type: the result type of Python UDF is needed to be
converted to Java data type to make sure that it could fit into the existing
type system of the table module, e.g. the type inference, etc.
- Java type to Python type: it's currently only used to retrieve the schema of
a Table (via Table.get_schema().get_field_data_types()). For example, users may
check the schema of a table.
Regarding to the Python/Java object conversion, you are right and it has
already used Arrow as the data exchange format between the Java process and
Python process for vectorized Python UDF(which takes pandas.Series as the input
and output).
> Fix the conversion from Java row type to Python row type
> --------------------------------------------------------
>
> Key: FLINK-17062
> URL: https://issues.apache.org/jira/browse/FLINK-17062
> Project: Flink
> Issue Type: Bug
> Components: API / Python
> Affects Versions: 1.9.0
> Reporter: Dian Fu
> Assignee: Dian Fu
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.9.3, 1.10.1, 1.11.0
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> It iterate over the result of FieldsDataType.getFieldDataTypes when
> converting Java row type to Python row type. The result is non-deterministic
> as the result of FieldsDataType.getFieldDataTypes is of type map.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)