[
https://issues.apache.org/jira/browse/FLINK-35529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
JingWei Li updated FLINK-35529:
-------------------------------
Description:
The main bug occurs during the decode process. The decode method is a method
generated by the codegen of Flink at runtime, and in the process of generating
the decode method, some getter and setter methods of the protobuf object need
to be used to construct the RowData. Currently, the way to generate the getter
and setter is through string concatenation, using the "get" prefix and
camelCase variable names. Some special characters may lead to errors in the
generated Getter and Setter methods, thus causing bugs.
- Examples:
- If the protobuf defines a field named "class", the Getter method will be
getClass(), which conflicts with the Object.getClass() method, so the real
value of the "class" field cannot be accessed. The method generated by protoc
is getClass_().
- If the protobuf defines two fields "a_b_c" and "ab_c", the Getter methods
are both getABC(), causing a naming conflict. The methods generated by protoc
are getABC+sequence number().
Solution:
{code:java}
//case a
if (name1 + "Count" == name2) {
*info = "both repeated field \"" + field1->name() + "\" and singular " +
"field \"" + field2->name() + "\" generate the method \"" +
"get" + name1 + "Count()\"";
return true;
}
if (name1 + "List" == name2) {
*info = "both repeated field \"" + field1->name() + "\" and singular " +
"field \"" + field2->name() + "\" generate the method \"" +
"get" + name1 + "List()\"";
return true;
}
//case b
if (name == other_name) {
is_conflict[i] = is_conflict[j] = true;
conflict_reason[i] = conflict_reason[j] =
"capitalized name of field \"" + field->name() +
"\" conflicts with field \"" + other->name() + "\"";
} else if (IsConflicting(field, name, other, other_name,
&conflict_reason[j])) {
is_conflict[i] = is_conflict[j] = true;
conflict_reason[i] = conflict_reason[j];
}
//solver
for (int i = 0; i < fields.size(); ++i) {
const FieldDescriptor* field = fields[i];
FieldGeneratorInfo info;
info.name = CamelCaseFieldName(field);
info.capitalized_name = UnderscoresToCapitalizedCamelCase(field);
// For fields conflicting with some other fields, we append the field
// number to their field names in generated code to avoid conflicts.
if (is_conflict[i]) {
info.name += StrCat(field->number());
info.capitalized_name += StrCat(field->number());
info.disambiguated_reason = conflict_reason[i];
}
field_generator_info_map_[field] = info;
} {code}
was:
The main bug occurs during the decode process. The decode method is a method
generated by the codegen of Flink at runtime, and in the process of generating
the decode method, some getter and setter methods of the protobuf object need
to be used to construct the RowData. Currently, the way to generate the getter
and setter is through string concatenation, using the "get" prefix and
camelCase variable names. Some special characters may lead to errors in the
generated Getter and Setter methods, thus causing bugs.
- Examples:
- If the protobuf defines a field named "class", the Getter method will be
getClass(), which conflicts with the Object.getClass() method, so the real
value of the "class" field cannot be accessed. The method generated by protoc
is getClass_().
- If the protobuf defines two fields "a_b_c" and "ab_c", the Getter methods
are both getABC(), causing a naming conflict. The methods generated by protoc
are getABC+sequence number().
resolution
> protobuf-format compatible protobuf bad indentifier
> ---------------------------------------------------
>
> Key: FLINK-35529
> URL: https://issues.apache.org/jira/browse/FLINK-35529
> Project: Flink
> Issue Type: Improvement
> Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
> Affects Versions: 1.17.2
> Reporter: JingWei Li
> Priority: Major
> Fix For: 2.0.0
>
>
> The main bug occurs during the decode process. The decode method is a method
> generated by the codegen of Flink at runtime, and in the process of
> generating the decode method, some getter and setter methods of the protobuf
> object need to be used to construct the RowData. Currently, the way to
> generate the getter and setter is through string concatenation, using the
> "get" prefix and camelCase variable names. Some special characters may lead
> to errors in the generated Getter and Setter methods, thus causing bugs.
> - Examples:
> - If the protobuf defines a field named "class", the Getter method will be
> getClass(), which conflicts with the Object.getClass() method, so the real
> value of the "class" field cannot be accessed. The method generated by protoc
> is getClass_().
> - If the protobuf defines two fields "a_b_c" and "ab_c", the Getter methods
> are both getABC(), causing a naming conflict. The methods generated by protoc
> are getABC+sequence number().
> Solution:
> {code:java}
> //case a
> if (name1 + "Count" == name2) {
> *info = "both repeated field \"" + field1->name() + "\" and singular " +
> "field \"" + field2->name() + "\" generate the method \"" +
> "get" + name1 + "Count()\"";
> return true;
> }
> if (name1 + "List" == name2) {
> *info = "both repeated field \"" + field1->name() + "\" and singular " +
> "field \"" + field2->name() + "\" generate the method \"" +
> "get" + name1 + "List()\"";
> return true;
> }
> //case b
> if (name == other_name) {
> is_conflict[i] = is_conflict[j] = true;
> conflict_reason[i] = conflict_reason[j] =
> "capitalized name of field \"" + field->name() +
> "\" conflicts with field \"" + other->name() + "\"";
> } else if (IsConflicting(field, name, other, other_name,
> &conflict_reason[j])) {
> is_conflict[i] = is_conflict[j] = true;
> conflict_reason[i] = conflict_reason[j];
> }
> //solver
> for (int i = 0; i < fields.size(); ++i) {
> const FieldDescriptor* field = fields[i];
> FieldGeneratorInfo info;
> info.name = CamelCaseFieldName(field);
> info.capitalized_name = UnderscoresToCapitalizedCamelCase(field);
> // For fields conflicting with some other fields, we append the field
> // number to their field names in generated code to avoid conflicts.
> if (is_conflict[i]) {
> info.name += StrCat(field->number());
> info.capitalized_name += StrCat(field->number());
> info.disambiguated_reason = conflict_reason[i];
> }
> field_generator_info_map_[field] = info;
> } {code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)