jamb2024 opened a new issue, #11144:
URL: https://github.com/apache/hudi/issues/11144

   Hi.
   I am developing a process to ingest data from my hdfs using Hudi. I want to 
partition the data using a custom keygenerator class where the partition key 
will be a tuple columnName@NumPartitions. Then, in my custom keygenerator using 
the function module to send the row to a partition or another.
   
   The initial load is the following:
   
   
   spark.read.option("mergeSchema","true").parquet("PATH").
   withColumn("_hoodie_is_deleted", lit(false)).
   write.format("hudi").
          option(OPERATION_OPT_KEY, "upsert").
          option(CDC_ENABLED.key(), "true").
          option(TABLE_NAME, tableName).
          
option("hoodie.datasource.write.payload.class","CustomOverwriteWithLatestAvroPayload").
          option("hoodie.avro.schema.validate","false").
          option("hoodie.datasource.write.recordkey.field","CID").
          option("hoodie.datasource.write.precombine.field","sequential_total").
          option("hoodie.datasource.write.new.columns.nullable", "true"). 
          option("hoodie.datasource.write.reconcile.schema","true").
          option("hoodie.metadata.enable","false").
          option("hoodie.index.type","SIMPLE").
          option("hoodie.datasource.write.table.type","COPY_ON_WRITE").
          
option("hoodie.datasource.write.keygenerator.class","CustomKeyGenerator").
          option("hoodie.datasource.write.partitionpath.field","CID@12").
          option("hoodie.datasource.write.drop.partition.columns","true").
          mode(Overwrite).
          save("/tmp/hudi2")
   
   I have added the property hoodie.datasource.write.drop.partition.columns 
because when I read the final path, hudi throws me the error: Cannot find 
columns: 'CID@12' in the schema
   But with this property, It does not work either. The error that appears is 
the following:
   
   org.apache.hudi.internal.schema.HoodieSchemaException: Failed to fetch 
schema from the table
     at 
org.apache.hudi.HoodieBaseRelation.$anonfun$x$2$10(HoodieBaseRelation.scala:179)
     at scala.Option.getOrElse(Option.scala:189)
     at 
org.apache.hudi.HoodieBaseRelation.x$2$lzycompute(HoodieBaseRelation.scala:175)
     at org.apache.hudi.HoodieBaseRelation.x$2(HoodieBaseRelation.scala:151)
     at 
org.apache.hudi.HoodieBaseRelation.internalSchemaOpt$lzycompute(HoodieBaseRelation.scala:151)
     at 
org.apache.hudi.HoodieBaseRelation.internalSchemaOpt(HoodieBaseRelation.scala:151)
     at 
org.apache.hudi.BaseFileOnlyRelation.<init>(BaseFileOnlyRelation.scala:69)
     at 
org.apache.hudi.DefaultSource$.resolveBaseFileOnlyRelation(DefaultSource.scala:321)
     at org.apache.hudi.DefaultSource$.createRelation(DefaultSource.scala:262)
     at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:118)
     at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:74)
     at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350)
     at 
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:274)
     at 
org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:245)
     at scala.Option.getOrElse(Option.scala:189)
     at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:245)
     at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:188)
     ... 63 elided
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to