asheeshgarg opened a new issue #1787:
URL: https://github.com/apache/hudi/issues/1787


   **Setup  
org.apache.hudi:hudi-spark-bundle_2.11:0.5.3,org.apache.spark:spark-avro_2.11:2.4.4
   Client PySpark
   Storage S3:**
   hudi_options = {
     | 'hoodie.table.name': self.table_name,
     | 'hoodie.datasource.write.recordkey.field': 'column',
     | 'hoodie.datasource.write.table.name': self.table_name,
     | 'hoodie.datasource.write.precombine.field': 'column',
     | 'hoodie.datasource.write.partitionpath.field': 'dl_snapshot_date',
     | 'hoodie.upsert.shuffle.parallelism': 2,
     | 'hoodie.insert.shuffle.parallelism': 2
     | }
   **Data get written and able to load with spark. But write produce exception**
   
   20/07/02 21:53:36 ERROR PriorityBasedFileSystemView: Got error running 
preferred function. Trying secondary
   org.apache.hudi.exception.HoodieRemoteException: 10.34.184.84:38937 failed 
to respond
        at 
org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.getPendingCompactionOperations(RemoteHoodieTableFileSystemView.java:376)
        at 
org.apache.hudi.common.table.view.PriorityBasedFileSystemView.execute(PriorityBasedFileSystemView.java:66)
        at 
org.apache.hudi.common.table.view.PriorityBasedFileSystemView.getPendingCompactionOperations(PriorityBasedFileSystemView.java:199)
        at org.apache.hudi.table.CleanHelper.<init>(CleanHelper.java:78)
        at 
org.apache.hudi.table.HoodieCopyOnWriteTable.scheduleClean(HoodieCopyOnWriteTable.java:288)
        at 
org.apache.hudi.client.HoodieCleanClient.scheduleClean(HoodieCleanClient.java:118)
        at 
org.apache.hudi.client.HoodieCleanClient.clean(HoodieCleanClient.java:95)
        at 
org.apache.hudi.client.HoodieWriteClient.clean(HoodieWriteClient.java:835)
        at 
org.apache.hudi.client.HoodieWriteClient.postCommit(HoodieWriteClient.java:512)
        at 
org.apache.hudi.client.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:157)
        at 
org.apache.hudi.client.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:101)
        at 
org.apache.hudi.client.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:92)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.checkWriteStatus(HoodieSparkSqlWriter.scala:268)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:188)
        at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:108)
        at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
        at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
        at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
        at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
        at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
        at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:748)
   hudi_options = {
   --
     | 'hoodie.table.name': self.table_name,
     | 'hoodie.datasource.write.recordkey.field': 'column',
     | 'hoodie.datasource.write.table.name': self.table_name,
     | 'hoodie.datasource.write.precombine.field': 'column',
     | 'hoodie.datasource.write.partitionpath.field': 'dl_snapshot_date',
     | 'hoodie.upsert.shuffle.parallelism': 2,
     | 'hoodie.insert.shuffle.parallelism': 2
     | }
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to