VIKASPATID opened a new issue #4635:
URL: https://github.com/apache/hudi/issues/4635


   Seeing repetitive error when bulk writing to cow table, error message not 
very clear. Please note that we were able to write bunch of files to hudi 
successfully and started getting this error.
   
   **Configuration**
   'className' : 'org.apache.hudi'
   'hoodie.write.concurrency.mode':'optimistic_concurrency_control'
   'hoodie.cleaner.policy.failed.writes':'LAZY'
   'hoodie.write.lock.zookeeper.lock_key': f"{table_name}",
   'hoodie.datasource.write.row.writer.enable': 'false',
   'hoodie.table.name': table_name,
   'hoodie.datasource.write.table.type': 'COPY_ON_WRITE',
   'hoodie.datasource.write.recordkey.field': 'TICKER,ORDER_NUM',
   'hoodie.datasource.write.partitionpath.field': 'ISO,DATE',
   'hoodie.datasource.write.keygenerator.class': 
'org.apache.hudi.keygen.ComplexKeyGenerator',
   'hoodie.datasource.write.precombine.field': "DATE",
   'hoodie.datasource.hive_sync.use_jdbc': 'false',
   'hoodie.datasource.hive_sync.enable': 'false',
   'hoodie.compaction.payload.class': 
'org.apache.hudi.common.model.OverwriteNonDefaultsWithLatestAvroPayload',
   'hoodie.datasource.hive_sync.table': f"{table_name}",
   'hoodie.datasource.hive_sync.partition_fields': 'ISO,DATE',
   'hoodie.datasource.hive_sync.partition_extractor_class': 
'org.apache.hudi.hive.MultiPartKeysValueExtractor',
   'hoodie.copyonwrite.record.size.estimate': 256,
   'hoodie.write.lock.client.wait_time_ms': 1000,
   'hoodie.write.lock.client.num_retries': 50
   'hoodie.parquet.max.file.size': 1024*1024*1024,
   'hoodie.bulkinsert.shuffle.parallelism': 10,
   'compactionSmallFileSize': 100*1024*1024,
   'hoodie.datasource.write.operation': 'bulk_insert'
   
   **Environment Description**
   
   * Running on EMR 6.5.0
   
   * Hudi version : 0.9
   
   * Spark version : 3.1.2
   
   * ZooKeeper version : 3.5.7
   
   * Hive version :3.1.2
   
   * Hadoop version : 
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : No
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```java.lang.NullPointerException
           at 
org.apache.hudi.table.HoodieTimelineArchiveLog.lambda$getInstantsToArchive$8(HoodieTimelineArchiveLog.java:225)
           at 
java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:269)
           at java.util.stream.SliceOps$1$1.accept(SliceOps.java:204)
           at 
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
           at 
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
           at 
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
           at 
java.util.ArrayList$ArrayListSpliterator.tryAdvance(ArrayList.java:1361)
           at 
java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:126)
           at 
java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:499)
           at 
java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:486)
           at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
           at 
java.util.stream.StreamSpliterators$WrappingSpliterator.forEachRemaining(StreamSpliterators.java:313)
           at 
java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:743)
           at 
java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
           at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
           at 
java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
           at 
java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
           at 
java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
           at 
org.apache.hudi.table.HoodieTimelineArchiveLog.archiveIfRequired(HoodieTimelineArchiveLog.java:122)
           at 
org.apache.hudi.client.AbstractHoodieWriteClient.postCommit(AbstractHoodieWriteClient.java:439)
           at 
org.apache.hudi.client.AbstractHoodieWriteClient.commitStats(AbstractHoodieWriteClient.java:191)
           at 
org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:124)
           at 
org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:617)
           at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:274)
           at 
org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:169)
           at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
           at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
           at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
           at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
           at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:194)
           at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:232)
           at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
           at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:229)
           at 
org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:190)
           at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:134)
           at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:133)
           at 
org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
           at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
           at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
           at 
org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110)
           at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135)
           at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
           at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
           at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135)
           at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253)
           at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134)
           at 
org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
           at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
           at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
           at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
           at 
org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
           at 
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:301)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke(Method.java:498)
           at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
           at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
           at py4j.Gateway.invoke(Gateway.java:282)
           at 
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
           at py4j.commands.CallCommand.execute(CallCommand.java:79)
           at py4j.GatewayConnection.run(GatewayConnection.java:238)
           at java.lang.Thread.run(Thread.java:748)
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to