lsy created AVRO-3115:
-------------------------
Summary: Avro builder only supports Digits/Letter/_ for column
names
Key: AVRO-3115
URL: https://issues.apache.org/jira/browse/AVRO-3115
Project: Apache Avro
Issue Type: New Feature
Affects Versions: 1.8.1
Environment: Linux, mac
Reporter: lsy
we use avro writer/reader to generate parquet file and read from it.
According to [https://avro.apache.org/docs/current/spec.html#names,] avro has
restriction on Name
* start with [A-Za-z_]
* subsequently contain only [A-Za-z0-9_
But it surprises me that the parquet file can be written/read correctly by
disabling the name validation. I disable the the schema name validation by turn
off the flag here:
[https://github.com/rdblue/avro-java/blob/43e4a3c3de6fa5b0a083b2669a11e99949e01070/avro/src/main/java/org/apache/avro/Schema.java#L1141].
I wonder what's the reason behind the name restriction since the parquet file
generation works pretty good with random unicode chars. Avro can support
unicode charset natively so we don't need to patch it in our side.
Example stacktrace when trying to build the Avro
schema(TableSchema.buildAvroSchema()):
org.apache.avro.SchemaParseException: Illegal character in: d_dim_name❤ ☀ a
quick brown ☘_82
at org.apache.avro.Schema.validateName(Schema.java:1151)
~[avro-1.8.1.jar:1.8.1]
at org.apache.avro.Schema.access$200(Schema.java:81) ~[avro-1.8.1.jar:1.8.1]
at org.apache.avro.Schema$Field.<init>(Schema.java:403) ~[avro-1.8.1.jar:1.8.1]
at
org.apache.avro.SchemaBuilder$FieldBuilder.completeField(SchemaBuilder.java:2124)
~[avro-1.8.1.jar:1.8.1]
at
org.apache.avro.SchemaBuilder$FieldBuilder.completeField(SchemaBuilder.java:2116)
~[avro-1.8.1.jar:1.8.1]
at
org.apache.avro.SchemaBuilder$FieldBuilder.access$5300(SchemaBuilder.java:2034)
~[avro-1.8.1.jar:1.8.1]
at
org.apache.avro.SchemaBuilder$OptionalCompletion.complete(SchemaBuilder.java:2467)
~[avro-1.8.1.jar:1.8.1]
at
org.apache.avro.SchemaBuilder$OptionalCompletion.complete(SchemaBuilder.java:2458)
~[avro-1.8.1.jar:1.8.1]
at org.apache.avro.SchemaBuilder$PrimitiveBuilder.end(SchemaBuilder.java:499)
~[avro-1.8.1.jar:1.8.1]
at
org.apache.avro.SchemaBuilder$PrimitiveBuilder.access$900(SchemaBuilder.java:482)
~[avro-1.8.1.jar:1.8.1]
at org.apache.avro.SchemaBuilder$StringBldr.endString(SchemaBuilder.java:655)
~[avro-1.8.1.jar:1.8.1]
at
org.apache.avro.SchemaBuilder$BaseTypeBuilder.stringType(SchemaBuilder.java:1107)
~[avro-1.8.1.jar:1.8.1]
at
org.apache.avro.SchemaBuilder$FieldAssembler.optionalString(SchemaBuilder.java:1958)
~[avro-1.8.1.jar:1.8.1]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)