[
https://issues.apache.org/jira/browse/FLINK-32650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Benchao Li reassigned FLINK-32650:
----------------------------------
Assignee: 李精卫
> Added the ability to split flink-protobuf codegen code
> ------------------------------------------------------
>
> Key: FLINK-32650
> URL: https://issues.apache.org/jira/browse/FLINK-32650
> Project: Flink
> Issue Type: Improvement
> Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
> Affects Versions: 1.18.0
> Reporter: 李精卫
> Assignee: 李精卫
> Priority: Major
>
> h3. backgroud
> Flink serializes and deserializes protobuf format data by calling the decode
> or encode method in GeneratedProtoToRow_XXX.java generated by codegen to
> parse byte[] data into protobuf java objects. The size of the decode/encode
> codegen method body is strongly related to the number of defined fields in
> protobuf. When the number of fields exceeds a certain threshold and the
> compiled method body exceeds 8k, the decode/encode method will not be
> optimized by JIT, seriously affecting serialization or deserialization
> performance. Even if the compiled method body exceeds 64k, it will directly
> cause the task to fail to start.
> h3. solution
> So I proposed Codegen Splitter for protobuf parsing to split the
> encode/decode method to solve this problem.
> The specific idea is as follows. In the current decode/encode method, each
> field defined for the protobuf message is placed in the method body. In fact,
> there are no shared parameters between the fields, so multiple fields can be
> merged and parsed and written into the split method body. If the number of
> strings in the current method body exceeds the threshold, a split method will
> be generated, these fields will be parsed in the split method, and the split
> method will be called in the decode/encode method. By analogy, the
> decode/encode method including the split method is finally generated.
> after spilt code example
>
> {code:java}
> //代码占位符
> public static RowData
> decode(org.apache.flink.formats.protobuf.testproto.AdProfile.AdProfilePb
> message){
> RowData rowData=null;
> org.apache.flink.formats.protobuf.testproto.AdProfile.AdProfilePb message1242
> = message;
> GenericRowData rowData1242 = new GenericRowData(5);
> split2585(rowData1242, message1242);
> rowData = rowData1242;return rowData;
> }
> {code}
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)