lsy created AVRO-3115:
-------------------------

             Summary: Avro builder only supports Digits/Letter/_ for column 
names
                 Key: AVRO-3115
                 URL: https://issues.apache.org/jira/browse/AVRO-3115
             Project: Apache Avro
          Issue Type: New Feature
    Affects Versions: 1.8.1
         Environment: Linux, mac
            Reporter: lsy


we use avro writer/reader to generate parquet file and read from it. 

According to [https://avro.apache.org/docs/current/spec.html#names,] avro has 
restriction on Name
 * start with [A-Za-z_]
 * subsequently contain only [A-Za-z0-9_

But it surprises me that the parquet file can be written/read correctly by 
disabling the name validation. I disable the the schema name validation by turn 
off the flag here:
 
[https://github.com/rdblue/avro-java/blob/43e4a3c3de6fa5b0a083b2669a11e99949e01070/avro/src/main/java/org/apache/avro/Schema.java#L1141].

I wonder what's the reason behind the name restriction since the parquet file 
generation works pretty good with random unicode chars. Avro can support 
unicode charset natively so we don't need to patch it in our side.

 

Example stacktrace when trying to build the Avro 
schema(TableSchema.buildAvroSchema()):
 org.apache.avro.SchemaParseException: Illegal character in: d_dim_name❤ ☀ a 
quick brown ☘_82
 at org.apache.avro.Schema.validateName(Schema.java:1151) 
~[avro-1.8.1.jar:1.8.1]
 at org.apache.avro.Schema.access$200(Schema.java:81) ~[avro-1.8.1.jar:1.8.1]
 at org.apache.avro.Schema$Field.<init>(Schema.java:403) ~[avro-1.8.1.jar:1.8.1]
 at 
org.apache.avro.SchemaBuilder$FieldBuilder.completeField(SchemaBuilder.java:2124)
 ~[avro-1.8.1.jar:1.8.1]
 at 
org.apache.avro.SchemaBuilder$FieldBuilder.completeField(SchemaBuilder.java:2116)
 ~[avro-1.8.1.jar:1.8.1]
 at 
org.apache.avro.SchemaBuilder$FieldBuilder.access$5300(SchemaBuilder.java:2034) 
~[avro-1.8.1.jar:1.8.1]
 at 
org.apache.avro.SchemaBuilder$OptionalCompletion.complete(SchemaBuilder.java:2467)
 ~[avro-1.8.1.jar:1.8.1]
 at 
org.apache.avro.SchemaBuilder$OptionalCompletion.complete(SchemaBuilder.java:2458)
 ~[avro-1.8.1.jar:1.8.1]
 at org.apache.avro.SchemaBuilder$PrimitiveBuilder.end(SchemaBuilder.java:499) 
~[avro-1.8.1.jar:1.8.1]
 at 
org.apache.avro.SchemaBuilder$PrimitiveBuilder.access$900(SchemaBuilder.java:482)
 ~[avro-1.8.1.jar:1.8.1]
 at org.apache.avro.SchemaBuilder$StringBldr.endString(SchemaBuilder.java:655) 
~[avro-1.8.1.jar:1.8.1]
 at 
org.apache.avro.SchemaBuilder$BaseTypeBuilder.stringType(SchemaBuilder.java:1107)
 ~[avro-1.8.1.jar:1.8.1]
 at 
org.apache.avro.SchemaBuilder$FieldAssembler.optionalString(SchemaBuilder.java:1958)
 ~[avro-1.8.1.jar:1.8.1]

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to