[ 
https://issues.apache.org/jira/browse/AVRO-4033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17875762#comment-17875762
 ] 

ASF subversion and git services commented on AVRO-4033:
-------------------------------------------------------

Commit ea2c54b9d1bd8dd72faeacddcff430178aaf6441 in avro's branch 
refs/heads/main from hwse
[ https://gitbox.apache.org/repos/asf?p=avro.git;h=ea2c54b9d ]

AVRO-4033: [C++] Filter out redundant union classes generated by avrogencpp 
(#3088)

* AVRO-4033: [C++] Filter out redundant union classes generated by avrogencpp. 
For a unique list of union branches only one class will be generated. This can 
reduce the header size in schemas with many unions.

* AVRO-4033: [C++] Align parameter names for 
UnionCodeTracker::setTraitsGenerated to be more consistent (#3088)

---------

Co-authored-by: hwse <[email protected]>

> [C++] Remove redundant union classes generated by avrogencpp
> ------------------------------------------------------------
>
>                 Key: AVRO-4033
>                 URL: https://issues.apache.org/jira/browse/AVRO-4033
>             Project: Apache Avro
>          Issue Type: Improvement
>          Components: c++
>            Reporter: Hagen Weiße
>            Priority: Major
>              Labels: c++, pull-request-available
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently avrogencpp generates a class for each union type that is 
> encountered in the schema. Even if there is a class that represents the exact 
> same union, a new type will be generated.
> Example Schema:
> {code:json}
> {
>     "type": "record",
>     "doc": "Top level Doc.",
>     "name": "RootRecord",
>     "fields": [
>         {
>             "name": "nullable_string_1",
>             "doc": "mylong field doc.",
>             "type": [
>                 "null",
>                 "string"
>             ]
>         },
>         {
>             "name": "nullable_string_2",
>             "doc": "mylong field doc.",
>             "type": [
>                 "null",
>                 "string"
>             ]
>         },
>         {
>             "name": "nullable_string_3",
>             "doc": "mylong field doc.",
>             "type": [
>                 "null",
>                 "string"
>             ]
>         }
>     ]
> }
> {code}
> The generated RootRecord will look like this:
> {code:c++}
> struct RootRecord {
>     typedef _union_test_json_Union__0__ nullable_string_1_t;
>     typedef _union_test_json_Union__1__ nullable_string_2_t;
>     typedef _union_test_json_Union__2__ nullable_string_3_t;
>     nullable_string_1_t nullable_string_1;
>     nullable_string_2_t nullable_string_2;
>     nullable_string_3_t nullable_string_3;
>     RootRecord() :
>         nullable_string_1(nullable_string_1_t()),
>         nullable_string_2(nullable_string_2_t()),
>         nullable_string_3(nullable_string_3_t())
>         { }
> };{code}
> Especially for common union types (e.g. union of null and string), this leads 
> to a lot of redundant code. 
> To solve this avrogencpp could track the name of union types that are 
> generated and filter out duplicates.
> The generated RootRecord would then look like this:
> {code:c++}
>  struct RootRecord {
>     typedef _union_test_json_Union__0__ nullable_string_1_t;
>     typedef _union_test_json_Union__0__ nullable_string_2_t;
>     typedef _union_test_json_Union__0__ nullable_string_3_t;
>     nullable_string_1_t nullable_string_1;
>     nullable_string_2_t nullable_string_2;
>     nullable_string_3_t nullable_string_3;
>     RootRecord() :
>         nullable_string_1(nullable_string_1_t()),
>         nullable_string_2(nullable_string_2_t()),
>         nullable_string_3(nullable_string_3_t())
>         { }
> };{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to