[
https://issues.apache.org/jira/browse/FLINK-33611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802335#comment-17802335
]
Sai Sharath Dandi commented on FLINK-33611:
-------------------------------------------
[~libenchao] All identifier names in the code are part of the constant pool
including local variable names. You can use the javap tool on a simple class
file to examine the constant pool contents -
[ref|[https://blogs.oracle.com/javamagazine/post/java-class-file-constant-pool].]
Here's an example class and it's constant pool content obtained with javap -
{code:java}
public class Hello {
public void sayHello1() {
Integer a1;
int b;
String c;
}
public void sayHello2() {
Integer a2;
int b;
String c;
}
} {code}
{code:java}
Constant pool:
#1 = Methodref #6.#25 // java/lang/Object."<init>":()V
#2 = Methodref #26.#27 //
java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
#3 = String #28 // hi
#4 = String #29 // hello
#5 = Class #30 //
com/uber/athena/athenax/connector/kafka/formats/protobuf/deserialize/Hello
#6 = Class #31 // java/lang/Object
#7 = Utf8 <init>
#8 = Utf8 ()V
#9 = Utf8 Code
#10 = Utf8 LineNumberTable
#11 = Utf8 LocalVariableTable
#12 = Utf8 this
#13 = Utf8
Lcom/uber/athena/athenax/connector/kafka/formats/protobuf/deserialize/Hello;
#14 = Utf8 sayHello1
#15 = Utf8 a1
#16 = Utf8 Ljava/lang/Integer;
#17 = Utf8 b
#18 = Utf8 I
#19 = Utf8 c
#20 = Utf8 Ljava/lang/String;
#21 = Utf8 sayHello2
#22 = Utf8 a2
#23 = Utf8 SourceFile
#24 = Utf8 Hello.java
#25 = NameAndType #7:#8 // "<init>":()V
#26 = Class #32 // java/lang/Integer
#27 = NameAndType #33:#34 // valueOf:(I)Ljava/lang/Integer;
#28 = Utf8 hi
#29 = Utf8 hello
#30 = Utf8
com/uber/athena/athenax/connector/kafka/formats/protobuf/deserialize/Hello
#31 = Utf8 java/lang/Object
#32 = Utf8 java/lang/Integer
#33 = Utf8 valueOf
#34 = Utf8 (I)Ljava/lang/Integer; {code}
As we can see from the above example, local variable names are part of the
constant pool
> Support Large Protobuf Schemas
> ------------------------------
>
> Key: FLINK-33611
> URL: https://issues.apache.org/jira/browse/FLINK-33611
> Project: Flink
> Issue Type: Improvement
> Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
> Affects Versions: 1.18.0
> Reporter: Sai Sharath Dandi
> Assignee: Sai Sharath Dandi
> Priority: Major
> Labels: pull-request-available
>
> h3. Background
> Flink serializes and deserializes protobuf format data by calling the decode
> or encode method in GeneratedProtoToRow_XXX.java generated by codegen to
> parse byte[] data into Protobuf Java objects. FLINK-32650 has introduced the
> ability to split the generated code to improve the performance for large
> Protobuf schemas. However, this is still not sufficient to support some
> larger protobuf schemas as the generated code exceeds the java constant pool
> size [limit|https://en.wikipedia.org/wiki/Java_class_file#The_constant_pool]
> and we can see errors like "Too many constants" when trying to compile the
> generated code.
> *Solution*
> Since we already have the split code functionality already introduced, the
> main proposal here is to now reuse the variable names across different split
> method scopes. This will greatly reduce the constant pool size. One more
> optimization is to only split the last code segment also only when the size
> exceeds split threshold limit. Currently, the last segment of the generated
> code is always being split which can lead to too many split methods and thus
> exceed the constant pool size limit
--
This message was sent by Atlassian Jira
(v8.20.10#820010)