[
https://issues.apache.org/jira/browse/AVRO-4033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17875762#comment-17875762
]
ASF subversion and git services commented on AVRO-4033:
-------------------------------------------------------
Commit ea2c54b9d1bd8dd72faeacddcff430178aaf6441 in avro's branch
refs/heads/main from hwse
[ https://gitbox.apache.org/repos/asf?p=avro.git;h=ea2c54b9d ]
AVRO-4033: [C++] Filter out redundant union classes generated by avrogencpp
(#3088)
* AVRO-4033: [C++] Filter out redundant union classes generated by avrogencpp.
For a unique list of union branches only one class will be generated. This can
reduce the header size in schemas with many unions.
* AVRO-4033: [C++] Align parameter names for
UnionCodeTracker::setTraitsGenerated to be more consistent (#3088)
---------
Co-authored-by: hwse <[email protected]>
> [C++] Remove redundant union classes generated by avrogencpp
> ------------------------------------------------------------
>
> Key: AVRO-4033
> URL: https://issues.apache.org/jira/browse/AVRO-4033
> Project: Apache Avro
> Issue Type: Improvement
> Components: c++
> Reporter: Hagen Weiße
> Priority: Major
> Labels: c++, pull-request-available
> Time Spent: 50m
> Remaining Estimate: 0h
>
> Currently avrogencpp generates a class for each union type that is
> encountered in the schema. Even if there is a class that represents the exact
> same union, a new type will be generated.
> Example Schema:
> {code:json}
> {
> "type": "record",
> "doc": "Top level Doc.",
> "name": "RootRecord",
> "fields": [
> {
> "name": "nullable_string_1",
> "doc": "mylong field doc.",
> "type": [
> "null",
> "string"
> ]
> },
> {
> "name": "nullable_string_2",
> "doc": "mylong field doc.",
> "type": [
> "null",
> "string"
> ]
> },
> {
> "name": "nullable_string_3",
> "doc": "mylong field doc.",
> "type": [
> "null",
> "string"
> ]
> }
> ]
> }
> {code}
> The generated RootRecord will look like this:
> {code:c++}
> struct RootRecord {
> typedef _union_test_json_Union__0__ nullable_string_1_t;
> typedef _union_test_json_Union__1__ nullable_string_2_t;
> typedef _union_test_json_Union__2__ nullable_string_3_t;
> nullable_string_1_t nullable_string_1;
> nullable_string_2_t nullable_string_2;
> nullable_string_3_t nullable_string_3;
> RootRecord() :
> nullable_string_1(nullable_string_1_t()),
> nullable_string_2(nullable_string_2_t()),
> nullable_string_3(nullable_string_3_t())
> { }
> };{code}
> Especially for common union types (e.g. union of null and string), this leads
> to a lot of redundant code.
> To solve this avrogencpp could track the name of union types that are
> generated and filter out duplicates.
> The generated RootRecord would then look like this:
> {code:c++}
> struct RootRecord {
> typedef _union_test_json_Union__0__ nullable_string_1_t;
> typedef _union_test_json_Union__0__ nullable_string_2_t;
> typedef _union_test_json_Union__0__ nullable_string_3_t;
> nullable_string_1_t nullable_string_1;
> nullable_string_2_t nullable_string_2;
> nullable_string_3_t nullable_string_3;
> RootRecord() :
> nullable_string_1(nullable_string_1_t()),
> nullable_string_2(nullable_string_2_t()),
> nullable_string_3(nullable_string_3_t())
> { }
> };{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)