Kengo Seki created PARQUET-1598:
-----------------------------------

             Summary: Improve error message when convert-csv fails due to an 
invalid input file name
                 Key: PARQUET-1598
                 URL: https://issues.apache.org/jira/browse/PARQUET-1598
             Project: Parquet
          Issue Type: Improvement
          Components: parquet-cli
            Reporter: Kengo Seki


I ran parquet-cli's {{convert-csv}} with an input file which name starts with a 
numeric character without {{--schema}} option and got the following error:

{code}
$ java -cp 'target/*:target/dependency/*' org.apache.parquet.cli.Main 
convert-csv 0sample.csv -o sample.parquet
Unknown error
shaded.parquet.org.apache.avro.SchemaParseException: Illegal initial character: 
0sample
        at shaded.parquet.org.apache.avro.Schema.validateName(Schema.java:1498)
        at shaded.parquet.org.apache.avro.Schema.access$200(Schema.java:86)
        at shaded.parquet.org.apache.avro.Schema$Name.<init>(Schema.java:645)
        at shaded.parquet.org.apache.avro.Schema.createRecord(Schema.java:182)
        at 
shaded.parquet.org.apache.avro.SchemaBuilder$RecordBuilder.fields(SchemaBuilder.java:1805)
        at 
org.apache.parquet.cli.csv.AvroCSV.inferSchemaInternal(AvroCSV.java:158)
        at 
org.apache.parquet.cli.csv.AvroCSV.inferNullableSchema(AvroCSV.java:78)
        at 
org.apache.parquet.cli.commands.ConvertCSVCommand.run(ConvertCSVCommand.java:160)
        at org.apache.parquet.cli.Main.run(Main.java:147)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.parquet.cli.Main.main(Main.java:177)
{code}

This is because that {{convert-csv}} uses the input file name as the name for 
the output schema, while Avro requires its schema name to match the regex 
pattern {{[A-Za-z_][A-Za-z0-9_]*}}.
So users have to change the input file name or use the {{--schema}} option 
explicitly, but it's not so obvious from the error message.
It'd be nice if the message were improved, or the schema name were 
automatically replaced with valid characters to avoid this problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to