[
https://issues.apache.org/jira/browse/FLINK-35529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ruan Hang updated FLINK-35529:
------------------------------
Fix Version/s: 2.3.0
(was: 2.2.0)
> protobuf-format compatible protobuf bad indentifier
> ---------------------------------------------------
>
> Key: FLINK-35529
> URL: https://issues.apache.org/jira/browse/FLINK-35529
> Project: Flink
> Issue Type: Improvement
> Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
> Affects Versions: 2.1.0
> Reporter: JingWei Li
> Priority: Major
> Fix For: 2.3.0
>
>
> The main bug occurs during the decode process. The decode method is a method
> generated by the codegen of Flink at runtime, and in the process of
> generating the decode method, some getter and setter methods of the protobuf
> object need to be used to construct the RowData. Currently, the way to
> generate the getter and setter is through string concatenation, using the
> "get" prefix and camelCase variable names. Some special characters may lead
> to errors in the generated Getter and Setter methods, thus causing bugs.
> - Examples:
> - If the protobuf defines a field named "class", the Getter method will be
> getClass(), which conflicts with the Object.getClass() method, so the real
> value of the "class" field cannot be accessed. The method generated by protoc
> is getClass_().
> - If the protobuf defines two fields "a_b_c" and "ab_c", the Getter methods
> are both getABC(), causing a naming conflict. The methods generated by protoc
> are getABC+sequence number().
> Solution:
> {code:java}
> //case a
> if (name1 + "Count" == name2) {
> *info = "both repeated field \"" + field1->name() + "\" and singular " +
> "field \"" + field2->name() + "\" generate the method \"" +
> "get" + name1 + "Count()\"";
> return true;
> }
> if (name1 + "List" == name2) {
> *info = "both repeated field \"" + field1->name() + "\" and singular " +
> "field \"" + field2->name() + "\" generate the method \"" +
> "get" + name1 + "List()\"";
> return true;
> }
> //case b
> if (name == other_name) {
> is_conflict[i] = is_conflict[j] = true;
> conflict_reason[i] = conflict_reason[j] =
> "capitalized name of field \"" + field->name() +
> "\" conflicts with field \"" + other->name() + "\"";
> } else if (IsConflicting(field, name, other, other_name,
> &conflict_reason[j])) {
> is_conflict[i] = is_conflict[j] = true;
> conflict_reason[i] = conflict_reason[j];
> }
> //solver
> for (int i = 0; i < fields.size(); ++i) {
> const FieldDescriptor* field = fields[i];
> FieldGeneratorInfo info;
> info.name = CamelCaseFieldName(field);
> info.capitalized_name = UnderscoresToCapitalizedCamelCase(field);
> // For fields conflicting with some other fields, we append the field
> // number to their field names in generated code to avoid conflicts.
> if (is_conflict[i]) {
> info.name += StrCat(field->number());
> info.capitalized_name += StrCat(field->number());
> info.disambiguated_reason = conflict_reason[i];
> }
> field_generator_info_map_[field] = info;
> } {code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)