[
https://issues.apache.org/jira/browse/AVRO-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Qiang Zhao updated AVRO-4224:
-----------------------------
Description:
When using `avro-protobuf` to generate an Avro schema from a Protobuf class
that contains nested classes, the `ProtobufData.get().getSchema()` method
generates a schema that includes a `$` in the namespace of the nested class.
Starting with Avro 1.12.0, the `Schema.Parser` no longer allows `$` in
namespaces, leading to a `SchemaParseException` when trying to parse the
generated schema. This behaviour was not present in version 1.11.5 and
constitutes a breaking change.
A sample project that reproduces this issue is available at:
`[https://github.com/mattisonchao/avro-schema-breaking]`
*{*}Steps to Reproduce:{*}*
1. Define a Protobuf message with a nested message, like the one below.
`data_record.proto`
```proto
syntax = "proto3";
package io.github.mattison;
option java_package = "io.github.mattison";
option java_outer_classname = "DataRecordOuterClass";
message DataRecord {
string field1 = 1;
int64 field2 = 2;
NestedDataRecord field3 = 3;
repeated NestedDataRecord fields4 = 4;
message NestedDataRecord
{ string field1 = 1; int64 field2 = 2; }
}
```
2. In a Java application, use
`org.apache.avro.protobuf.ProtobufData.get().getSchema()` to generate an Avro
schema from the compiled Protobuf class.
3. Attempt to parse the generated schema string using `new
Schema.Parser().parse()`. The code will fail.
`Application.java`
```java
package io.github.mattison;
import org.apache.avro.Schema;
import org.apache.avro.protobuf.ProtobufData;
public class Application {
public static void main(String[] args)
{ final Schema schema =
ProtobufData.get().getSchema(DataRecordOuterClass.DataRecord.class); final
Schema.Parser parser = new Schema.Parser(); // The following line will throw an
exception with avro-protobuf >= 1.12.0 parser.parse(schema.toString());
System.out.println(parser); }
}
```
*{*}Expected Behavior:{*}*
The schema should be parsed successfully, as it was in Avro 1.11.5. The `$`
character in the namespace, which is automatically generated by `ProtobufData`
for nested classes, should either be handled gracefully by the parser or
avoided during schema generation.
*{*}Actual Behavior:{*}*
A `SchemaParseException` is thrown, preventing the schema from being parsed.
*{*}Stack Trace:{*}*
```
Exception in thread "main" org.apache.avro.SchemaParseException: Namespace part
"DataRecord$DataRecord" is invalid: Illegal character in: DataRecord$DataRecord
at org.apache.avro.ParseContext.validateName(ParseContext.java:241)
at org.apache.avro.ParseContext.requireValidFullName(ParseContext.java:232)
at org.apache.avro.ParseContext.put(ParseContext.java:213)
at org.apache.avro.Schema.parseRecord(Schema.java:1882)
at org.apache.avro.Schema.parse(Schema.java:1836)
at org.apache.avro.Schema.parseUnion(Schema.java:1972)
at org.apache.avro.Schema.parse(Schema.java:1849)
at org.apache.avro.Schema.parseField(Schema.java:1892)
at org.apache.avro.Schema.parseRecord(Schema.java:1872)
at org.apache.avro.Schema.parse(Schema.java:1836)
at org.apache.avro.Schema$Parser.parse(Schema.java:1539)
at org.apache.avro.Schema$Parser.parse(Schema.java:1516)
at io.github.mattison.Application.main(Application.java:13)
```
This issue blocks the upgrade path for users who rely on Avro's Protobuf
compatibility and have nested message definitions.
was:
When using `avro-protobuf` to generate an Avro schema from a Protobuf class
that contains nested classes, the `ProtobufData.get().getSchema()` method
generates a schema that includes a `$` in the namespace of the nested class.
Starting with Avro 1.12.0, the `Schema.Parser` no longer allows `$` in
namespaces, leading to a `SchemaParseException` when trying to parse the
generated schema. This behaviour was not present in version 1.11.5 and
constitutes a breaking change.
A sample project that reproduces this issue is available at:
`https://github.com/mattisonchao/avro-schema-breaking`
**Steps to Reproduce:**
1. Define a Protobuf message with a nested message, like the one below.
`data_record.proto`
```proto
syntax = "proto3";
package io.github.mattison;
option java_package = "io.github.mattison";
option java_outer_classname = "DataRecordOuterClass";
message DataRecord {
string field1 = 1;
int64 field2 = 2;
NestedDataRecord field3 = 3;
repeated NestedDataRecord fields4 = 4;
message NestedDataRecord {
string field1 = 1;
int64 field2 = 2;
}
}
```
2. In a Java application, use
`org.apache.avro.protobuf.ProtobufData.get().getSchema()` to generate an Avro
schema from the compiled Protobuf class.
3. Attempt to parse the generated schema string using `new
Schema.Parser().parse()`. The code will fail.
`Application.java`
```java
package io.github.mattison;
import org.apache.avro.Schema;
import org.apache.avro.protobuf.ProtobufData;
public class Application {
public static void main(String[] args) {
final Schema schema =
ProtobufData.get().getSchema(DataRecordOuterClass.DataRecord.class);
final Schema.Parser parser = new Schema.Parser();
// The following line will throw an exception with avro-protobuf >= 1.12.0
parser.parse(schema.toString());
System.out.println(parser);
}
}
```
**Expected Behavior:**
The schema should be parsed successfully, as it was in Avro 1.11.5. The `$`
character in the namespace, which is automatically generated by `ProtobufData`
for nested classes, should either be handled gracefully by the parser or
avoided during schema generation.
**Actual Behavior:**
A `SchemaParseException` is thrown, preventing the schema from being parsed.
**Stack Trace:**
```
Exception in thread "main" org.apache.avro.SchemaParseException: Namespace part
"DataRecord$DataRecord" is invalid: Illegal character in: DataRecord$DataRecord
at org.apache.avro.ParseContext.validateName(ParseContext.java:241)
at org.apache.avro.ParseContext.requireValidFullName(ParseContext.java:232)
at org.apache.avro.ParseContext.put(ParseContext.java:213)
at org.apache.avro.Schema.parseRecord(Schema.java:1882)
at org.apache.avro.Schema.parse(Schema.java:1836)
at org.apache.avro.Schema.parseUnion(Schema.java:1972)
at org.apache.avro.Schema.parse(Schema.java:1849)
at org.apache.avro.Schema.parseField(Schema.java:1892)
at org.apache.avro.Schema.parseRecord(Schema.java:1872)
at org.apache.avro.Schema.parse(Schema.java:1836)
at org.apache.avro.Schema$Parser.parse(Schema.java:1539)
at org.apache.avro.Schema$Parser.parse(Schema.java:1516)
at io.github.mattison.Application.main(Application.java:13)
```
This issue blocks the upgrade path for users who rely on Avro's Protobuf
compatibility and have nested message definitions.
> `SchemaParseException` when parsing a schema generated from Protobuf with
> nested classes due to `$` in the namespace.
> ---------------------------------------------------------------------------------------------------------------------
>
> Key: AVRO-4224
> URL: https://issues.apache.org/jira/browse/AVRO-4224
> Project: Apache Avro
> Issue Type: Bug
> Components: java
> Reporter: Qiang Zhao
> Priority: Major
>
> When using `avro-protobuf` to generate an Avro schema from a Protobuf class
> that contains nested classes, the `ProtobufData.get().getSchema()` method
> generates a schema that includes a `$` in the namespace of the nested class.
> Starting with Avro 1.12.0, the `Schema.Parser` no longer allows `$` in
> namespaces, leading to a `SchemaParseException` when trying to parse the
> generated schema. This behaviour was not present in version 1.11.5 and
> constitutes a breaking change.
> A sample project that reproduces this issue is available at:
> `[https://github.com/mattisonchao/avro-schema-breaking]`
> *{*}Steps to Reproduce:{*}*
> 1. Define a Protobuf message with a nested message, like the one below.
> `data_record.proto`
> ```proto
> syntax = "proto3";
> package io.github.mattison;
> option java_package = "io.github.mattison";
> option java_outer_classname = "DataRecordOuterClass";
> message DataRecord {
> string field1 = 1;
> int64 field2 = 2;
> NestedDataRecord field3 = 3;
> repeated NestedDataRecord fields4 = 4;
> message NestedDataRecord
> { string field1 = 1; int64 field2 = 2; }
> }
> ```
> 2. In a Java application, use
> `org.apache.avro.protobuf.ProtobufData.get().getSchema()` to generate an Avro
> schema from the compiled Protobuf class.
> 3. Attempt to parse the generated schema string using `new
> Schema.Parser().parse()`. The code will fail.
> `Application.java`
> ```java
> package io.github.mattison;
> import org.apache.avro.Schema;
> import org.apache.avro.protobuf.ProtobufData;
> public class Application {
> public static void main(String[] args)
> { final Schema schema =
> ProtobufData.get().getSchema(DataRecordOuterClass.DataRecord.class); final
> Schema.Parser parser = new Schema.Parser(); // The following line will throw
> an exception with avro-protobuf >= 1.12.0 parser.parse(schema.toString());
> System.out.println(parser); }
> }
> ```
> *{*}Expected Behavior:{*}*
> The schema should be parsed successfully, as it was in Avro 1.11.5. The `$`
> character in the namespace, which is automatically generated by
> `ProtobufData` for nested classes, should either be handled gracefully by the
> parser or avoided during schema generation.
> *{*}Actual Behavior:{*}*
> A `SchemaParseException` is thrown, preventing the schema from being parsed.
> *{*}Stack Trace:{*}*
> ```
> Exception in thread "main" org.apache.avro.SchemaParseException: Namespace
> part "DataRecord$DataRecord" is invalid: Illegal character in:
> DataRecord$DataRecord
> at org.apache.avro.ParseContext.validateName(ParseContext.java:241)
> at org.apache.avro.ParseContext.requireValidFullName(ParseContext.java:232)
> at org.apache.avro.ParseContext.put(ParseContext.java:213)
> at org.apache.avro.Schema.parseRecord(Schema.java:1882)
> at org.apache.avro.Schema.parse(Schema.java:1836)
> at org.apache.avro.Schema.parseUnion(Schema.java:1972)
> at org.apache.avro.Schema.parse(Schema.java:1849)
> at org.apache.avro.Schema.parseField(Schema.java:1892)
> at org.apache.avro.Schema.parseRecord(Schema.java:1872)
> at org.apache.avro.Schema.parse(Schema.java:1836)
> at org.apache.avro.Schema$Parser.parse(Schema.java:1539)
> at org.apache.avro.Schema$Parser.parse(Schema.java:1516)
> at io.github.mattison.Application.main(Application.java:13)
> ```
> This issue blocks the upgrade path for users who rely on Avro's Protobuf
> compatibility and have nested message definitions.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)