[
https://issues.apache.org/jira/browse/FLINK-33611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Benchao Li resolved FLINK-33611.
--------------------------------
Fix Version/s: 1.20
Resolution: Fixed
Fixed via df03ada10e226053780cb2e5e9742add4536289c (master)
[~dsaisharath] Thanks for your contribution!
> Support Large Protobuf Schemas
> ------------------------------
>
> Key: FLINK-33611
> URL: https://issues.apache.org/jira/browse/FLINK-33611
> Project: Flink
> Issue Type: Improvement
> Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
> Affects Versions: 1.18.0
> Reporter: Sai Sharath Dandi
> Assignee: Sai Sharath Dandi
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.20
>
>
> h3. Background
> Flink serializes and deserializes protobuf format data by calling the decode
> or encode method in GeneratedProtoToRow_XXX.java generated by codegen to
> parse byte[] data into Protobuf Java objects. FLINK-32650 has introduced the
> ability to split the generated code to improve the performance for large
> Protobuf schemas. However, this is still not sufficient to support some
> larger protobuf schemas as the generated code exceeds the java constant pool
> size [limit|https://en.wikipedia.org/wiki/Java_class_file#The_constant_pool]
> and we can see errors like "Too many constants" when trying to compile the
> generated code.
> *Solution*
> Since we already have the split code functionality already introduced, the
> main proposal here is to now reuse the variable names across different split
> method scopes. This will greatly reduce the constant pool size. One more
> optimization is to only split the last code segment also only when the size
> exceeds split threshold limit. Currently, the last segment of the generated
> code is always being split which can lead to too many split methods and thus
> exceed the constant pool size limit
--
This message was sent by Atlassian Jira
(v8.20.10#820010)