[ 
https://issues.apache.org/jira/browse/AVRO-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Zhao updated AVRO-4224:
-----------------------------
    Description: 
When using `avro-protobuf` to generate an Avro schema from a Protobuf class 
that contains nested classes, the `ProtobufData.get().getSchema()` method 
generates a schema that includes a `$` in the namespace of the nested class.

Starting with Avro 1.12.0, the `Schema.Parser` no longer allows `$` in 
namespaces, leading to a `SchemaParseException` when trying to parse the 
generated schema. This behaviour was not present in version 1.11.5 and 
constitutes a breaking change.

A sample project that reproduces this issue is available at: 
`[https://github.com/mattisonchao/avro-schema-breaking]`

*{*}Steps to Reproduce:{*}*

1. Define a Protobuf message with a nested message, like the one below.

`data_record.proto`


```proto
syntax = "proto3";

package io.github.mattison;

option java_package = "io.github.mattison";
option java_outer_classname = "DataRecordOuterClass";

message DataRecord {
string field1 = 1;
int64 field2 = 2;
NestedDataRecord field3 = 3;
repeated NestedDataRecord fields4 = 4;

message NestedDataRecord

{ string field1 = 1; int64 field2 = 2; }

}
```

2. In a Java application, use 
`org.apache.avro.protobuf.ProtobufData.get().getSchema()` to generate an Avro 
schema from the compiled Protobuf class.

3. Attempt to parse the generated schema string using `new 
Schema.Parser().parse()`. The code will fail.

`Application.java`


```java
package io.github.mattison;

import org.apache.avro.Schema;
import org.apache.avro.protobuf.ProtobufData;

public class Application {
public static void main(String[] args)

{ final Schema schema = 
ProtobufData.get().getSchema(DataRecordOuterClass.DataRecord.class); final 
Schema.Parser parser = new Schema.Parser(); // The following line will throw an 
exception with avro-protobuf >= 1.12.0 parser.parse(schema.toString()); 
System.out.println(parser); }

}
```

*{*}Expected Behavior:{*}*
The schema should be parsed successfully, as it was in Avro 1.11.5. The `$` 
character in the namespace, which is automatically generated by `ProtobufData` 
for nested classes, should either be handled gracefully by the parser or 
avoided during schema generation.

*{*}Actual Behavior:{*}*
A `SchemaParseException` is thrown, preventing the schema from being parsed.

*{*}Stack Trace:{*}*


```
Exception in thread "main" org.apache.avro.SchemaParseException: Namespace part 
"DataRecord$DataRecord" is invalid: Illegal character in: DataRecord$DataRecord
at org.apache.avro.ParseContext.validateName(ParseContext.java:241)
at org.apache.avro.ParseContext.requireValidFullName(ParseContext.java:232)
at org.apache.avro.ParseContext.put(ParseContext.java:213)
at org.apache.avro.Schema.parseRecord(Schema.java:1882)
at org.apache.avro.Schema.parse(Schema.java:1836)
at org.apache.avro.Schema.parseUnion(Schema.java:1972)
at org.apache.avro.Schema.parse(Schema.java:1849)
at org.apache.avro.Schema.parseField(Schema.java:1892)
at org.apache.avro.Schema.parseRecord(Schema.java:1872)
at org.apache.avro.Schema.parse(Schema.java:1836)
at org.apache.avro.Schema$Parser.parse(Schema.java:1539)
at org.apache.avro.Schema$Parser.parse(Schema.java:1516)
at io.github.mattison.Application.main(Application.java:13)
```

This issue blocks the upgrade path for users who rely on Avro's Protobuf 
compatibility and have nested message definitions.

  was:
When using `avro-protobuf` to generate an Avro schema from a Protobuf class 
that contains nested classes, the `ProtobufData.get().getSchema()` method 
generates a schema that includes a `$` in the namespace of the nested class.

Starting with Avro 1.12.0, the `Schema.Parser` no longer allows `$` in 
namespaces, leading to a `SchemaParseException` when trying to parse the 
generated schema. This behaviour was not present in version 1.11.5 and 
constitutes a breaking change.

A sample project that reproduces this issue is available at: 
`https://github.com/mattisonchao/avro-schema-breaking`

**Steps to Reproduce:**

1. Define a Protobuf message with a nested message, like the one below.

`data_record.proto`
```proto
syntax = "proto3";

package io.github.mattison;

option java_package = "io.github.mattison";
option java_outer_classname = "DataRecordOuterClass";


message DataRecord {
string field1 = 1;
int64 field2 = 2;
NestedDataRecord field3 = 3;
repeated NestedDataRecord fields4 = 4;


message NestedDataRecord {
string field1 = 1;
int64 field2 = 2;
}
}
```

2. In a Java application, use 
`org.apache.avro.protobuf.ProtobufData.get().getSchema()` to generate an Avro 
schema from the compiled Protobuf class.

3. Attempt to parse the generated schema string using `new 
Schema.Parser().parse()`. The code will fail.

`Application.java`
```java
package io.github.mattison;

import org.apache.avro.Schema;
import org.apache.avro.protobuf.ProtobufData;

public class Application {
public static void main(String[] args) {
final Schema schema = 
ProtobufData.get().getSchema(DataRecordOuterClass.DataRecord.class);
final Schema.Parser parser = new Schema.Parser();
// The following line will throw an exception with avro-protobuf >= 1.12.0
parser.parse(schema.toString());
System.out.println(parser);
}
}
```

**Expected Behavior:**
The schema should be parsed successfully, as it was in Avro 1.11.5. The `$` 
character in the namespace, which is automatically generated by `ProtobufData` 
for nested classes, should either be handled gracefully by the parser or 
avoided during schema generation.

**Actual Behavior:**
A `SchemaParseException` is thrown, preventing the schema from being parsed.

**Stack Trace:**
```
Exception in thread "main" org.apache.avro.SchemaParseException: Namespace part 
"DataRecord$DataRecord" is invalid: Illegal character in: DataRecord$DataRecord
at org.apache.avro.ParseContext.validateName(ParseContext.java:241)
at org.apache.avro.ParseContext.requireValidFullName(ParseContext.java:232)
at org.apache.avro.ParseContext.put(ParseContext.java:213)
at org.apache.avro.Schema.parseRecord(Schema.java:1882)
at org.apache.avro.Schema.parse(Schema.java:1836)
at org.apache.avro.Schema.parseUnion(Schema.java:1972)
at org.apache.avro.Schema.parse(Schema.java:1849)
at org.apache.avro.Schema.parseField(Schema.java:1892)
at org.apache.avro.Schema.parseRecord(Schema.java:1872)
at org.apache.avro.Schema.parse(Schema.java:1836)
at org.apache.avro.Schema$Parser.parse(Schema.java:1539)
at org.apache.avro.Schema$Parser.parse(Schema.java:1516)
at io.github.mattison.Application.main(Application.java:13)
```

This issue blocks the upgrade path for users who rely on Avro's Protobuf 
compatibility and have nested message definitions.


> `SchemaParseException` when parsing a schema generated from Protobuf with 
> nested classes due to `$` in the namespace.
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-4224
>                 URL: https://issues.apache.org/jira/browse/AVRO-4224
>             Project: Apache Avro
>          Issue Type: Bug
>          Components: java
>            Reporter: Qiang Zhao
>            Priority: Major
>
> When using `avro-protobuf` to generate an Avro schema from a Protobuf class 
> that contains nested classes, the `ProtobufData.get().getSchema()` method 
> generates a schema that includes a `$` in the namespace of the nested class.
> Starting with Avro 1.12.0, the `Schema.Parser` no longer allows `$` in 
> namespaces, leading to a `SchemaParseException` when trying to parse the 
> generated schema. This behaviour was not present in version 1.11.5 and 
> constitutes a breaking change.
> A sample project that reproduces this issue is available at: 
> `[https://github.com/mattisonchao/avro-schema-breaking]`
> *{*}Steps to Reproduce:{*}*
> 1. Define a Protobuf message with a nested message, like the one below.
> `data_record.proto`
> ```proto
> syntax = "proto3";
> package io.github.mattison;
> option java_package = "io.github.mattison";
> option java_outer_classname = "DataRecordOuterClass";
> message DataRecord {
> string field1 = 1;
> int64 field2 = 2;
> NestedDataRecord field3 = 3;
> repeated NestedDataRecord fields4 = 4;
> message NestedDataRecord
> { string field1 = 1; int64 field2 = 2; }
> }
> ```
> 2. In a Java application, use 
> `org.apache.avro.protobuf.ProtobufData.get().getSchema()` to generate an Avro 
> schema from the compiled Protobuf class.
> 3. Attempt to parse the generated schema string using `new 
> Schema.Parser().parse()`. The code will fail.
> `Application.java`
> ```java
> package io.github.mattison;
> import org.apache.avro.Schema;
> import org.apache.avro.protobuf.ProtobufData;
> public class Application {
> public static void main(String[] args)
> { final Schema schema = 
> ProtobufData.get().getSchema(DataRecordOuterClass.DataRecord.class); final 
> Schema.Parser parser = new Schema.Parser(); // The following line will throw 
> an exception with avro-protobuf >= 1.12.0 parser.parse(schema.toString()); 
> System.out.println(parser); }
> }
> ```
> *{*}Expected Behavior:{*}*
> The schema should be parsed successfully, as it was in Avro 1.11.5. The `$` 
> character in the namespace, which is automatically generated by 
> `ProtobufData` for nested classes, should either be handled gracefully by the 
> parser or avoided during schema generation.
> *{*}Actual Behavior:{*}*
> A `SchemaParseException` is thrown, preventing the schema from being parsed.
> *{*}Stack Trace:{*}*
> ```
> Exception in thread "main" org.apache.avro.SchemaParseException: Namespace 
> part "DataRecord$DataRecord" is invalid: Illegal character in: 
> DataRecord$DataRecord
> at org.apache.avro.ParseContext.validateName(ParseContext.java:241)
> at org.apache.avro.ParseContext.requireValidFullName(ParseContext.java:232)
> at org.apache.avro.ParseContext.put(ParseContext.java:213)
> at org.apache.avro.Schema.parseRecord(Schema.java:1882)
> at org.apache.avro.Schema.parse(Schema.java:1836)
> at org.apache.avro.Schema.parseUnion(Schema.java:1972)
> at org.apache.avro.Schema.parse(Schema.java:1849)
> at org.apache.avro.Schema.parseField(Schema.java:1892)
> at org.apache.avro.Schema.parseRecord(Schema.java:1872)
> at org.apache.avro.Schema.parse(Schema.java:1836)
> at org.apache.avro.Schema$Parser.parse(Schema.java:1539)
> at org.apache.avro.Schema$Parser.parse(Schema.java:1516)
> at io.github.mattison.Application.main(Application.java:13)
> ```
> This issue blocks the upgrade path for users who rely on Avro's Protobuf 
> compatibility and have nested message definitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to