Mourya1319 opened a new issue, #10553:
URL: https://github.com/apache/hudi/issues/10553
I am trying to create a hudi table that has a column starting with
numbers/digits. Getting the below error.
Steps to reproduce the behavior:
1. create database if not exists db location 's3://<>'
2. create table if not exists db.sample(id int, 360p string, 720p string)
using hudi location 's3://<>'
3. insert into db.sample values (1,'yes','yes'), (2,'no','no')
**Expected behavior**
Create a hudi table at specified location.
**Environment Description**
* Hudi version : 0.14.1
* Spark version : 3.4.1
* Hive version : 3.1.3
* Hadoop version : 3.3.6
* Running on Docker? (yes/no) : no
**Additional context**
Add any other context about the problem here.
**Stacktrace**
``` org.apache.avro.SchemaParseException: Illegal initial character: 360p
at org.apache.avro.Schema.validateName(Schema.java:1603)
at org.apache.avro.Schema.access$400(Schema.java:92)
at org.apache.avro.Schema$Field.<init>(Schema.java:556)
at
org.apache.avro.SchemaBuilder$FieldBuilder.completeField(SchemaBuilder.java:2258)
at
org.apache.avro.SchemaBuilder$FieldBuilder.completeField(SchemaBuilder.java:2254)
at
org.apache.avro.SchemaBuilder$FieldBuilder.access$5100(SchemaBuilder.java:2150)
at
org.apache.avro.SchemaBuilder$GenericDefault.noDefault(SchemaBuilder.java:2557)
at
org.apache.hudi.org.apache.spark.sql.avro.SchemaConverters$.$anonfun$toAvroType$2(SchemaConverters.scala:205)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)
at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
at org.apache.spark.sql.types.StructType.foreach(StructType.scala:105)
at
org.apache.hudi.org.apache.spark.sql.avro.SchemaConverters$.toAvroType(SchemaConverters.scala:202)
at
org.apache.hudi.org.apache.spark.sql.avro.SchemaConverters$.$anonfun$toAvroType$2(SchemaConverters.scala:204)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)
at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
at org.apache.spark.sql.types.StructType.foreach(StructType.scala:105)
at
org.apache.hudi.org.apache.spark.sql.avro.SchemaConverters$.toAvroType(SchemaConverters.scala:202)
at
org.apache.hudi.org.apache.spark.sql.avro.SchemaConverters$.$anonfun$toAvroType$2(SchemaConverters.scala:204)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)
at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
at org.apache.spark.sql.types.StructType.foreach(StructType.scala:105)
at
org.apache.hudi.org.apache.spark.sql.avro.SchemaConverters$.toAvroType(SchemaConverters.scala:202)
at
org.apache.hudi.org.apache.spark.sql.avro.SchemaConverters$.toAvroType(SchemaConverters.scala:186)
at
org.apache.hudi.org.apache.spark.sql.avro.SchemaConverters$.$anonfun$toAvroType$2(SchemaConverters.scala:204)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)
at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
at org.apache.spark.sql.types.StructType.foreach(StructType.scala:105)
at
org.apache.hudi.org.apache.spark.sql.avro.SchemaConverters$.toAvroType(SchemaConverters.scala:202)
at
org.apache.spark.sql.catalyst.catalog.HoodieCatalogTable.initHoodieTable(HoodieCatalogTable.scala:217)
at
org.apache.spark.sql.hudi.command.CreateHoodieTableCommand.run(CreateHoodieTableCommand.scala:71)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
at
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:104)
at
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
at
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:250)
at
org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:123)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$9(SQLExecution.scala:160)
at
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
at
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:250)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$8(SQLExecution.scala:160)
at
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:271)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:159)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:69)
at
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:101)
at
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:97)
at
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:554)
at
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:107)
at
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:554)
at
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:32)
at
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
at
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
at
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
at
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
at
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:530)
at
org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:97)
at
org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:84)
at
org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:82)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:221)
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:101)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:98)
at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:640)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:630)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:671)
... 44 elided```
My 2 cents about the problem:
I believe Hudi uses Avro as data serialization framework. It writes data
files as Parquets and Metadata Files as Avro Files. Avro Files rely on JSON
Schema for schema validation during the serialization and de-serialization
process. And the problem with JSON is that in a key-value pair, KEY should
always be a STRING and if you put a digit in double quotes it still doesn't
recognize that as a STRING. I think the problem somehow related to this.
I wanted to know if we can change any configuration for Hudi which allows us
to use PARQUET for Metadata Files as well. I have searched the Hudi
Documentation but I couldn't find anything useful.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]