easonwood opened a new issue, #5290:
URL: https://github.com/apache/hudi/issues/5290

   
   I have a spark job with such pipeline:
   mysql binlog data -> parquet files on S3 -> hudi (external tables on S3).
   
   Codes Like This:
   val dataDF = spark.read.option("mergeSchema","true").parquet(parquetPaths:_*)
   val hudiOptions = {"hoodie.metadata.enable" -> "true", 
"hoodie.datasource.write.operation" ->"upsert", 
"hoodie.datasource.write.table.type"->"COPY_ON_WRITE" }
   dataDF
         .write
         .format("org.apache.hudi")
         .options(hudiOptions).                    
         .mode(SaveMode.Append)
         .save(tablePath)
   
   The task runs well before it faced with the problem of mysql column 
deletion.  I got the error:
   Caused by: org.apache.parquet.io.InvalidRecordException: Parquet/Avro schema 
mismatch: Avro field 'col1' not found
   
   To handle it, I used the method below:
   
https://hudi.apache.org/docs/troubleshooting/#caused-by-orgapacheparquetioinvalidrecordexception-parquetavro-schema-mismatch-avro-field-col1-not-found
   
   And changed the code to:
   val dataDF = spark.read.schema(tableSchema).parquet(pathsNeedConsume:_*)
   -- tableSchema contains the deleted old columns.
   -- using dataDF.show here can see the schema and data are compatible, the 
deleted columns set to null
   val hudiOptions = {"hoodie.metadata.enable" -> "true", 
"hoodie.datasource.write.operation" ->"upsert", 
"hoodie.datasource.write.table.type"->"COPY_ON_WRITE" }
   dataDF
         .write
         .format("org.apache.hudi")
         .options(hudiOptions).                    
         .mode(SaveMode.Append)
         .save(tablePath)
   -- save returns the error:
   22/04/11 10:32:17 ERROR BaseTableMetadata: Failed to retrieve files in 
partition (..... tablePath .....) from metadata
   org.apache.hudi.exception.HoodieMetadataException: Metadata record for 
partition db_cluster=qa01 is inconsistent: HoodieMetadataPayload 
{key=db_cluster=qa01, type=2, 
creations=[5969c7b9-a1f5-4bcb-8382-8809ec0cd067-0_0-85-1188_20220411102336.parquet,
 5969c7b9-a1f5-4bcb-8382-8809ec0cd067-0_0-85-1195_20220411101502.parquet, 
5969c7b9-a1f5-4bcb-8382-8809ec0cd067-0_0-87-537_20220411092820.parquet ....... 
a lot parquet in tablePath], 
deletions=[5969c7b9-a1f5-4bcb-8382-8809ec0cd067-0_0-29-245_20220331071910.parquet,
 5969c7b9-a1f5-4bcb-8382-8809ec0cd067-0_0-29-246_20220331071910.parquet 
.......... a lot other parquet in tablePath], }
        at 
org.apache.hudi.metadata.BaseTableMetadata.fetchAllFilesInPartition(BaseTableMetadata.java:212)
        at 
org.apache.hudi.metadata.BaseTableMetadata.getAllFilesInPartition(BaseTableMetadata.java:130)
        at 
org.apache.hudi.metadata.HoodieMetadataFileSystemView.listPartition(HoodieMetadataFileSystemView.java:65)
        at 
org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$ensurePartitionLoadedCorrectly$9(AbstractTableFileSystemView.java:280)
        at 
java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660)
        at 
org.apache.hudi.common.table.view.AbstractTableFileSystemView.ensurePartitionLoadedCorrectly(AbstractTableFileSystemView.java:269)
        at 
org.apache.hudi.common.table.view.AbstractTableFileSystemView.getLatestBaseFilesBeforeOrOn(AbstractTableFileSystemView.java:455)
        at 
org.apache.hudi.timeline.service.handlers.BaseFileHandler.getLatestDataFilesBeforeOrOn(BaseFileHandler.java:57)
        at 
org.apache.hudi.timeline.service.RequestHandler.lambda$registerDataFilesAPI$6(RequestHandler.java:239)
        at 
org.apache.hudi.timeline.service.RequestHandler$ViewHandler.handle(RequestHandler.java:430)
        at 
io.javalin.core.security.SecurityUtil.noopAccessManager(SecurityUtil.kt:22)
        at 
io.javalin.http.JavalinServlet$addHandler$protectedHandler$1.handle(JavalinServlet.kt:116)
        at 
io.javalin.http.JavalinServlet$service$2$1.invoke(JavalinServlet.kt:45)
        at 
io.javalin.http.JavalinServlet$service$2$1.invoke(JavalinServlet.kt:24)
        at 
io.javalin.http.JavalinServlet$service$1.invoke(JavalinServlet.kt:123)
        at io.javalin.http.JavalinServlet$service$2.invoke(JavalinServlet.kt:40)
        at io.javalin.http.JavalinServlet.service(JavalinServlet.kt:75)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:590)
        at 
org.apache.hudi.org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:852)
        at 
org.apache.hudi.org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:544)
        at 
org.apache.hudi.org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
        at 
org.apache.hudi.org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1581)
        at 
org.apache.hudi.org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
        at 
io.javalin.core.JavalinServer$start$httpHandler$1.doHandle(JavalinServer.kt:53)
        at 
org.apache.hudi.org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
        at 
org.apache.hudi.org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:482)
        at 
org.apache.hudi.org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1549)
        at 
org.apache.hudi.org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
        at 
org.apache.hudi.org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1204)
        at 
org.apache.hudi.org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
        at 
org.apache.hudi.org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:59)
        at 
org.apache.hudi.org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:173)
        at 
org.apache.hudi.org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
        at 
org.apache.hudi.org.eclipse.jetty.server.Server.handle(Server.java:494)
        at 
org.apache.hudi.org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:374)
        at 
org.apache.hudi.org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:268)
        at 
org.apache.hudi.org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
        at 
org.apache.hudi.org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
        at 
org.apache.hudi.org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
        at 
org.apache.hudi.org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:782)
        at 
org.apache.hudi.org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:918)
        at java.lang.Thread.run(Thread.java:750)
   
   
   
   * Hudi version : 0.8
   
   * Spark version : 3.1.2
   
   * Storage: S3
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to