[jira] [Resolved] (FLINK-33611) Support Large Protobuf Schemas

Benchao Li (Jira) Tue, 06 Feb 2024 21:06:34 -0800


     [ 
https://issues.apache.org/jira/browse/FLINK-33611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Benchao Li resolved FLINK-33611.
--------------------------------
    Fix Version/s: 1.20
       Resolution: Fixed

Fixed via df03ada10e226053780cb2e5e9742add4536289c (master)

[~dsaisharath] Thanks for your contribution!

> Support Large Protobuf Schemas
> ------------------------------
>
>                 Key: FLINK-33611
>                 URL: https://issues.apache.org/jira/browse/FLINK-33611
>             Project: Flink
>          Issue Type: Improvement
>          Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>    Affects Versions: 1.18.0
>            Reporter: Sai Sharath Dandi
>            Assignee: Sai Sharath Dandi
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.20
>
>
> h3. Background
> Flink serializes and deserializes protobuf format data by calling the decode 
> or encode method in GeneratedProtoToRow_XXX.java generated by codegen to 
> parse byte[] data into Protobuf Java objects. FLINK-32650 has introduced the 
> ability to split the generated code to improve the performance for large 
> Protobuf schemas. However, this is still not sufficient to support some 
> larger protobuf schemas as the generated code exceeds the java constant pool 
> size [limit|https://en.wikipedia.org/wiki/Java_class_file#The_constant_pool] 
> and we can see errors like "Too many constants" when trying to compile the 
> generated code. 
> *Solution*
> Since we already have the split code functionality already introduced, the 
> main proposal here is to now reuse the variable names across different split 
> method scopes. This will greatly reduce the constant pool size. One more 
> optimization is to only split the last code segment also only when the size 
> exceeds split threshold limit. Currently, the last segment of the generated 
> code is always being split which can lead to too many split methods and thus 
> exceed the constant pool size limit



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (FLINK-33611) Support Large Protobuf Schemas

Reply via email to