[GitHub] [hudi] bvaradar commented on issue #2063: [SUPPORT] change column type from int to long, schema compatibility check failed

GitBox Mon, 07 Sep 2020 19:53:57 -0700


bvaradar commented on issue #2063:
URL: https://github.com/apache/hudi/issues/2063#issuecomment-688588841



   Context from dev email thread : 
   
   
   
   Jl Liu (cadl) <[email protected]>
   Unsubscribe
   To:
   [email protected]
   
   Mon, Sep 7 at 10:11 AM
   
   Thanks~
   
   I got another question about schema evolution. I don’t found document on 
homepage and wiki. If I change type from INT to LONG, will Audi overwrite total 
parquet files of the partition?
   
   I disable schema compatibility check and write LONG type data to existed INT 
type hudi table successfully, but got “Parquet column cannot be converted in 
file xxx.parquet. Column: [xxx], Expected: int, Found: INT64” error on read. It 
seems like some parquet files with different schema stored in the same 
directory, I can’t read them together.
   
   
   
   Hide original message
   
   > 2020年9月8日 上午12:30，Sivabalan <[email protected]> 写道：
   >
   > Actually, I guess it is a bug in hudi. reader and writer schema arguments
   > are called wrongly. (reader is sent for writer and writer is sent for
   > reader). Will file a bug. Then, as you expect, INT should be evolvable to
   > LONG, where as vice versa is incompatible.
   >
   >
   > On Mon, Sep 7, 2020 at 12:17 PM Sivabalan <[email protected]> wrote:
   >
   >> Hudi relies on avro's Schema compatability check. Looks like as per avro
   >> SchemaCompatability, INT can't be evolved to a LONG, but LONG to INT is
   >> allowed.
   >>
   >> Check line no 339 here
   >> 
<https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/SchemaCompatibility.java>
   >> .
   >> Also, check their test case here
   >> 
<https://github.com/apache/avro/blob/master/lang/java/avro/src/test/java/org/apache/avro/TestSchemaCompatibilityTypeMismatch.java>
 at
   >> line 44.
   >>
   >>
   >>
   >> On Mon, Sep 7, 2020 at 12:02 PM Prashant Wason <[email protected]>
   >> wrote:
   >>
   >>> Yes, the schema change looks fine. That would mean its an issue with the
   >>> schema compatibility checker. The are explicit checks for such cases so
   >>> can't say where the issue lies.
   >>>
   >>> I am out on a vacation this week. I will look into this as soon as I am
   >>> back.
   >>>
   >>> Thanks
   >>> Prashant
   >>>
   >>> On Sun, Sep 6, 2020, 11:18 AM Vinoth Chandar <[email protected]> wrote:
   >>>
   >>>> That does sound like a backwards compatible change.
   >>>> @prashant , any ideas here? (since you have the best context on the
   >>> schema
   >>>> validation checks)
   >>>>
   >>>> On Thu, Sep 3, 2020 at 8:12 PM cadl <[email protected]> wrote:
   >>>>
   >>>>> Hi All,
   >>>>>
   >>>>> I want to change the type of one column in my COW table, from int to
   >>>> long.
   >>>>> When I set “hoodie.avro.schema.validate = true” and upsert new data
   >>> with
   >>>>> long type, I got a “Failed upsert schema compatibility check” error.
   >>>> Dose
   >>>>> it break backwards compatibility? If I disable
   >>>> hoodie.avro.schema.validate,
   >>>>> I can upsert and read normally.
   >>>>>
   >>>>>
   >>>>> code demo:
   >>> https://gist.github.com/cadl/be433079747aeea88c9c1f45321cc2eb
   >>>>>
   >>>>> stacktrace:
   >>>>>
   >>>>>
   >>>>> org.apache.hudi.exception.HoodieUpsertException: Failed upsert schema
   >>>>> compatibility check.
   >>>>>  at
   >>>>>
   >>>>
   >>> 
org.apache.hudi.table.HoodieTable.validateUpsertSchema(HoodieTable.java:572)
   >>>>>  at
   >>>>>
   >>>>
   >>> 
org.apache.hudi.client.HoodieWriteClient.upsert(HoodieWriteClient.java:190)
   >>>>>  at
   >>>>>
   >>>>
   >>> 
org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:260)
   >>>>>  at
   >>>>>
   >>>>
   >>> 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:169)
   >>>>>  at
   >>>> org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:125)
   >>>>>  at
   >>>>>
   >>>>
   >>> 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
   >>>>>  at
   >>>>>
   >>>>
   >>> 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
   >>>>>  at
   >>>>>
   >>>>
   >>> 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
   >>>>>  at
   >>>>>
   >>>>
   >>> 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
   >>>>>  at
   >>>>>
   >>>>
   >>> 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
   >>>>>  at
   >>>>>
   >>>>
   >>> 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
   >>>>>  at
   >>>>>
   >>>>
   >>> 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
   >>>>>  at
   >>>>>
   >>>>
   >>> 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
   >>>>>  at
   >>>>>
   >>>>
   >>> 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
   >>>>>  at
   >>>> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
   >>>>>  at
   >>>>>
   >>>>
   >>> 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
   >>>>>  at
   >>>>>
   >>>>
   >>> 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
   >>>>>  at
   >>>>>
   >>>>
   >>> 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
   >>>>>  at
   >>>>>
   >>>>
   >>> 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
   >>>>>  at
   >>>>>
   >>>>
   >>> 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
   >>>>>  at
   >>>>>
   >>>>
   >>> 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
   >>>>>  at
   >>>>>
   >>>>
   >>> 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
   >>>>>  at
   >>>>>
   >>>>
   >>> 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
   >>>>>  at
   >>>>>
   >>>>
   >>> 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
   >>>>>  at
   >>> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
   >>>>>  at
   >>> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
   >>>>>  ... 69 elided
   >>>>> Caused by: org.apache.hudi.exception.HoodieException: Failed schema
   >>>>> compatibility check for writerSchema
   >>>>>
   >>>>
   >>> 
:{"type":"record","name":"foo_record","namespace":"hoodie.foo","fields":[{"name":"_hoodie_commit_time","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_commit_seqno","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_record_key","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_partition_path","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_file_name","type":["null","string"],"doc":"","default":null},{"name":"a","type":"long"},{"name":"b","type":"string"},{"name":"__row_key","type":"int"},{"name":"__row_version","type":"int"}]},
   >>>>> table schema
   >>>>>
   >>>>
   >>> 
:{"type":"record","name":"foo_record","namespace":"hoodie.foo","fields":[{"name":"_hoodie_commit_time","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_commit_seqno","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_record_key","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_partition_path","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_file_name","type":["null","string"],"doc":"","default":null},{"name":"a","type":"int"},{"name":"b","type":"string"},{"name":"__row_key","type":"int"},{"name":"__row_version","type":"int"}]},
   >>>>> base path :file:///jfs/cadl/hudi_data/schema/foo
   >>>>>  at
   >>>> org.apache.hudi.table.HoodieTable.validateSchema(HoodieTable.java:564)
   >>>>>  at
   >>>>>
   >>>>
   >>> 
org.apache.hudi.table.HoodieTable.validateUpsertSchema(HoodieTable.java:570)
   >>>>>  ... 94 more.
   >>>>
   >>>
   >>
   >>
   >> --
   >> Regards,
   >> -Sivabalan
   >>
   >
   >
   > --
   > Regards,
   > -Sivabalan
   
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] bvaradar commented on issue #2063: [SUPPORT] change column type from int to long, schema compatibility check failed

Reply via email to