[ 
https://issues.apache.org/jira/browse/HUDI-3622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3622:
-----------------------------
    Sprint: Hudi-Sprint-Mar-21

> Insert overwrite table with MOR table is failing w/ integ tests
> ---------------------------------------------------------------
>
>                 Key: HUDI-3622
>                 URL: https://issues.apache.org/jira/browse/HUDI-3622
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: tests-ci
>            Reporter: sivabalan narayanan
>            Assignee: sivabalan narayanan
>            Priority: Blocker
>             Fix For: 0.11.0
>
>
> Recently we added insert overwrite table yaml to integ test suite. Looks like 
> it fails for MOR table, while it succeeds w/ COW table. Even when metadata is 
> disabled, validation fails. 
>  
> yaml of interest:
> insert-overwrite-table.yaml
>  
> command to try out
> {code:java}
> spark-submit --packages org.apache.spark:spark-avro_2.11:2.4.0 --conf 
> spark.task.cpus=1 --conf spark.executor.cores=1 --conf 
> spark.task.maxFailures=100 --conf spark.memory.fraction=0.4  --conf 
> spark.rdd.compress=true  --conf spark.kryoserializer.buffer.max=2000m --conf 
> spark.serializer=org.apache.spark.serializer.KryoSerializer --conf 
> spark.memory.storageFraction=0.1 --conf spark.shuffle.service.enabled=true  
> --conf spark.sql.hive.convertMetastoreParquet=false  --conf 
> spark.driver.maxResultSize=12g --conf spark.executor.heartbeatInterval=120s 
> --conf spark.network.timeout=600s --conf spark.yarn.max.executor.failures=10 
> --conf spark.sql.catalogImplementation=hive --conf 
> spark.driver.extraClassPath=/var/demo/jars/* --conf 
> spark.executor.extraClassPath=/var/demo/jars/* --class 
> org.apache.hudi.integ.testsuite.HoodieTestSuiteJob 
> /opt/hudi-integ-test-bundle-0.11.0-SNAPSHOT.jar --source-ordering-field 
> test_suite_source_ordering_field --use-deltastreamer --target-base-path 
> /user/hive/warehouse/hudi-integ-test-suite/output --input-base-path 
> /user/hive/warehouse/hudi-integ-test-suite/input --target-table table1 
> --props test.properties --schemaprovider-class 
> org.apache.hudi.integ.testsuite.schema.TestSuiteFileBasedSchemaProvider 
> --source-class org.apache.hudi.utilities.sources.AvroDFSSource 
> --input-file-size 125829120 --workload-yaml-path 
> file:/opt/insert-overwrite-table.yaml --workload-generator-classname 
> org.apache.hudi.integ.testsuite.dag.WorkflowDagGenerator --table-type 
> MERGE_ON_READ --compact-scheduling-minshare 1 --clean-input --clean-output 
> {code}
>  
> stacktrace for insert-overwrite-table: 
> {code:java}
> 22/03/14 21:41:19 ERROR DagNode: Data set validation failed. Total count in 
> hudi 110060, input df count 110010. InputDf except hudi df = 0, Hudi df 
> except Input df 50
> 22/03/14 21:41:19 INFO DagScheduler: Forcing shutdown of executor service, 
> this might kill running tasks
> 22/03/14 21:41:19 ERROR HoodieTestSuiteJob: Failed to run Test Suite 
> java.util.concurrent.ExecutionException: java.lang.AssertionError: Hudi 
> contents does not match contents input data. 
>       at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>       at java.util.concurrent.FutureTask.get(FutureTask.java:206)
>       at 
> org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.execute(DagScheduler.java:113)
>       at 
> org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.schedule(DagScheduler.java:68)
>       at 
> org.apache.hudi.integ.testsuite.HoodieTestSuiteJob.runTestSuite(HoodieTestSuiteJob.java:203)
>       at 
> org.apache.hudi.integ.testsuite.HoodieTestSuiteJob.main(HoodieTestSuiteJob.java:170)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>       at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
>       at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
>       at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
>       at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
>       at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
>       at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
>       at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.AssertionError: Hudi contents does not match contents 
> input data. 
>       at 
> org.apache.hudi.integ.testsuite.dag.nodes.BaseValidateDatasetNode.execute(BaseValidateDatasetNode.java:109)
>       at 
> org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.executeNode(DagScheduler.java:139)
>       at 
> org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.lambda$execute$0(DagScheduler.java:105)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:748)
> Exception in thread "main" org.apache.hudi.exception.HoodieException: Failed 
> to run Test Suite 
>       at 
> org.apache.hudi.integ.testsuite.HoodieTestSuiteJob.runTestSuite(HoodieTestSuiteJob.java:208)
>       at 
> org.apache.hudi.integ.testsuite.HoodieTestSuiteJob.main(HoodieTestSuiteJob.java:170)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>       at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
>       at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
>       at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
>       at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
>       at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
>       at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
>       at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.util.concurrent.ExecutionException: java.lang.AssertionError: 
> Hudi contents does not match contents input data. 
>       at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>       at java.util.concurrent.FutureTask.get(FutureTask.java:206)
>       at 
> org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.execute(DagScheduler.java:113)
>       at 
> org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.schedule(DagScheduler.java:68)
>       at 
> org.apache.hudi.integ.testsuite.HoodieTestSuiteJob.runTestSuite(HoodieTestSuiteJob.java:203)
>       ... 13 more
> Caused by: java.lang.AssertionError: Hudi contents does not match contents 
> input data. 
>       at 
> org.apache.hudi.integ.testsuite.dag.nodes.BaseValidateDatasetNode.execute(BaseValidateDatasetNode.java:109)
>       at 
> org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.executeNode(DagScheduler.java:139)
>       at 
> org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.lambda$execute$0(DagScheduler.java:105)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:748) {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to