[ 
https://issues.apache.org/jira/browse/FLINK-35529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruan Hang updated FLINK-35529:
------------------------------
    Fix Version/s: 2.3.0
                       (was: 2.2.0)

> protobuf-format compatible protobuf bad indentifier
> ---------------------------------------------------
>
>                 Key: FLINK-35529
>                 URL: https://issues.apache.org/jira/browse/FLINK-35529
>             Project: Flink
>          Issue Type: Improvement
>          Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>    Affects Versions: 2.1.0
>            Reporter: JingWei Li
>            Priority: Major
>             Fix For: 2.3.0
>
>
> The main bug occurs during the decode process. The decode method is a method 
> generated by the codegen of Flink at runtime, and in the process of 
> generating the decode method, some getter and setter methods of the protobuf 
> object need to be used to construct the RowData. Currently, the way to 
> generate the getter and setter is through string concatenation, using the 
> "get" prefix and camelCase variable names. Some special characters may lead 
> to errors in the generated Getter and Setter methods, thus causing bugs.
>  - Examples:
>   - If the protobuf defines a field named "class", the Getter method will be 
> getClass(), which conflicts with the Object.getClass() method, so the real 
> value of the "class" field cannot be accessed. The method generated by protoc 
> is getClass_().
>   - If the protobuf defines two fields "a_b_c" and "ab_c", the Getter methods 
> are both getABC(), causing a naming conflict. The methods generated by protoc 
> are getABC+sequence number().
> Solution:
> {code:java}
> //case a
> if (name1 + "Count" == name2) {
>   *info = "both repeated field \"" + field1->name() + "\" and singular " +
>           "field \"" + field2->name() + "\" generate the method \"" +
>           "get" + name1 + "Count()\"";
>   return true;
> }
> if (name1 + "List" == name2) {
>   *info = "both repeated field \"" + field1->name() + "\" and singular " +
>           "field \"" + field2->name() + "\" generate the method \"" +
>           "get" + name1 + "List()\"";
>   return true;
> }
> //case b
> if (name == other_name) {
>   is_conflict[i] = is_conflict[j] = true;
>   conflict_reason[i] = conflict_reason[j] =
>       "capitalized name of field \"" + field->name() +
>       "\" conflicts with field \"" + other->name() + "\"";
> } else if (IsConflicting(field, name, other, other_name,
>                          &conflict_reason[j])) {
>   is_conflict[i] = is_conflict[j] = true;
>   conflict_reason[i] = conflict_reason[j];
> }
> //solver
> for (int i = 0; i < fields.size(); ++i) {
>   const FieldDescriptor* field = fields[i];
>   FieldGeneratorInfo info;
>   info.name = CamelCaseFieldName(field);
>   info.capitalized_name = UnderscoresToCapitalizedCamelCase(field);
>   // For fields conflicting with some other fields, we append the field
>   // number to their field names in generated code to avoid conflicts.
>   if (is_conflict[i]) {
>     info.name += StrCat(field->number());
>     info.capitalized_name += StrCat(field->number());
>     info.disambiguated_reason = conflict_reason[i];
>   }
>   field_generator_info_map_[field] = info;
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to