[ 
https://issues.apache.org/jira/browse/FLINK-32650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benchao Li reassigned FLINK-32650:
----------------------------------

    Assignee: 李精卫

> Added the ability to split flink-protobuf codegen code
> ------------------------------------------------------
>
>                 Key: FLINK-32650
>                 URL: https://issues.apache.org/jira/browse/FLINK-32650
>             Project: Flink
>          Issue Type: Improvement
>          Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>    Affects Versions: 1.18.0
>            Reporter: 李精卫
>            Assignee: 李精卫
>            Priority: Major
>
> h3. backgroud
> Flink serializes and deserializes protobuf format data by calling the decode 
> or encode method in GeneratedProtoToRow_XXX.java generated by codegen to 
> parse byte[] data into protobuf java objects. The size of the decode/encode 
> codegen method body is strongly related to the number of defined fields in 
> protobuf. When the number of fields exceeds a certain threshold and the 
> compiled method body exceeds 8k, the decode/encode method will not be 
> optimized by JIT, seriously affecting serialization or deserialization 
> performance. Even if the compiled method body exceeds 64k, it will directly 
> cause the task to fail to start.
> h3. solution
> So I proposed Codegen Splitter for protobuf parsing to split the 
> encode/decode method to solve this problem.
> The specific idea is as follows. In the current decode/encode method, each 
> field defined for the protobuf message is placed in the method body. In fact, 
> there are no shared parameters between the fields, so multiple fields can be 
> merged and parsed and written into the split method body. If the number of 
> strings in the current method body exceeds the threshold, a split method will 
> be generated, these fields will be parsed in the split method, and the split 
> method will be called in the decode/encode method. By analogy, the 
> decode/encode method including the split method is finally generated.
> after spilt code example
>  
> {code:java}
> //代码占位符
> public static RowData 
> decode(org.apache.flink.formats.protobuf.testproto.AdProfile.AdProfilePb 
> message){
> RowData rowData=null;
> org.apache.flink.formats.protobuf.testproto.AdProfile.AdProfilePb message1242 
> = message;
> GenericRowData rowData1242 = new GenericRowData(5);
> split2585(rowData1242, message1242);
> rowData = rowData1242;return rowData;
> }
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to