Jark Wu created FLINK-16296:
-------------------------------
Summary: Improve performance of BaseRowSerializer#serialize() for
GenericRow
Key: FLINK-16296
URL: https://issues.apache.org/jira/browse/FLINK-16296
Project: Flink
Issue Type: Improvement
Components: Table SQL / Runtime
Reporter: Jark Wu
Currently, when serialize a {{GenericRow}} using
{{BaseRowSerializer#serialize()}} , there will be 2 memory copy. The first is
GenericRow -> BinaryRow, the second is BinaryRow -> DataOutputView.
However, in theory, we can serialize GenericRow into DataOutputView directly,
because we already get all the column values and types. We can serialize the
null bit part for all columns and then the fix-part for all columns and then
the variable lenght part.
For example, when the column is a BinaryString, we can serialize the pos and
length, and calcute the new variable part length, and then serialize the next
column. If there is a generic type in the row, then it will fallback into
previous way. But generic type in SQL is rare.
This is a general improvements and can be benefit for every operators.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)