koochiswathiTR commented on issue #8984:
URL: https://github.com/apache/hudi/issues/8984#issuecomment-1596933153
[hadoop@ip-100-66-69-75 a206760-PowerUser2]$ spark-submit --packages
org.apache.hudi:hudi-utilities-bundle_2.12:0.11.1,org.apache.spark:spark-avro_2.11:2.4.4,org.apache.hudi:hudi-spark3-bundle_2.12:0.11.1
--verbose --driver-memory 4g --executor-memory 16g --num-executors 8
--driver-cores 10 --executor-cores 10 --class
org.apache.hudi.utilities.HoodieCompactor
/usr/lib/hudi/hudi-utilities-bundle.jar,/usr/lib/hudi/hudi-spark-bundle.jar
--table-name novusdoc --base-path s3://a206760-novusdoc-s3-dev-use1/novusdoc
--mode scheduleandexecute --spark-memory 2g --hoodie-conf
hoodie.metadata.enable=false --hoodie-conf
hoodie.compact.inline.trigger.strategy=NUM_COMMITS --hoodie-conf
hoodie.compact.inline.max.delta.commits=5
2023-06-19T10:26:47.109+0000: [GC pause (G1 Evacuation Pause) (young),
0.0037454 secs]
[Parallel Time: 1.6 ms, GC Workers: 8]
[GC Worker Start (ms): Min: 418.9, Avg: 419.0, Max: 419.0, Diff: 0.1]
[Ext Root Scanning (ms): Min: 0.1, Avg: 0.2, Max: 0.4, Diff: 0.3, Sum:
1.8]
[Update RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Processed Buffers: Min: 0, Avg: 0.0, Max: 0, Diff: 0, Sum: 0]
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.1, Max: 0.3, Diff: 0.3,
Sum: 0.6]
[Object Copy (ms): Min: 0.9, Avg: 1.0, Max: 1.1, Diff: 0.3, Sum: 8.1]
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.2]
[Termination Attempts: Min: 1, Avg: 6.9, Max: 12, Diff: 11, Sum: 55]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum:
0.1]
[GC Worker Total (ms): Min: 1.3, Avg: 1.4, Max: 1.4, Diff: 0.1, Sum:
10.9]
[GC Worker End (ms): Min: 420.3, Avg: 420.3, Max: 420.3, Diff: 0.0]
[Code Root Fixup: 0.0 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.1 ms]
[Other: 2.0 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 1.7 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.1 ms]
[Humongous Register: 0.0 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 0.0 ms]
[Eden: 24576.0K(24576.0K)->0.0B(34816.0K) Survivors: 0.0B->3072.0K Heap:
24576.0K(496.0M)->4071.5K(496.0M)]
[Times: user=0.01 sys=0.00, real=0.00 secs]
2023-06-19T10:26:47.455+0000: [GC pause (G1 Evacuation Pause) (young),
0.0053984 secs]
[Parallel Time: 2.8 ms, GC Workers: 8]
[GC Worker Start (ms): Min: 764.9, Avg: 765.1, Max: 766.4, Diff: 1.5]
[Ext Root Scanning (ms): Min: 0.0, Avg: 0.3, Max: 0.9, Diff: 0.9, Sum:
2.4]
[Update RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.1]
[Processed Buffers: Min: 0, Avg: 0.1, Max: 1, Diff: 1, Sum: 1]
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.1, Max: 0.6, Diff: 0.6,
Sum: 0.7]
[Object Copy (ms): Min: 0.9, Avg: 1.9, Max: 2.4, Diff: 1.5, Sum: 15.2]
[Termination (ms): Min: 0.0, Avg: 0.2, Max: 0.3, Diff: 0.3, Sum: 1.5]
[Termination Attempts: Min: 1, Avg: 15.1, Max: 28, Diff: 27, Sum:
121]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum:
0.1]
[GC Worker Total (ms): Min: 1.2, Avg: 2.5, Max: 2.7, Diff: 1.5, Sum:
19.9]
[GC Worker End (ms): Min: 767.6, Avg: 767.6, Max: 767.6, Diff: 0.0]
[Code Root Fixup: 0.0 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.2 ms]
[Other: 2.4 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 2.0 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.1 ms]
[Humongous Register: 0.0 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 0.0 ms]
[Eden: 34816.0K(34816.0K)->0.0B(292.0M) Survivors: 3072.0K->5120.0K Heap:
39486.1K(496.0M)->7351.0K(496.0M)]
[Times: user=0.02 sys=0.01, real=0.01 secs]
Using properties file: /usr/lib/spark/conf/spark-defaults.conf
Adding default property:
spark.serializer=org.apache.spark.serializer.KryoSerializer
Adding default property:
spark.yarn.appMasterEnv.bigdataEnv=bigdata_environment:dev,bigdata_project:tacticalnovusingest,bigdata_environment-type:DEVELOPMENT,bigdata_region:us-east-1,bigdata_servicename:tactical-novus-ingest,bigdata_version:dev4856801
Adding default property: spark.sql.warehouse.dir=hdfs:///user/spark/warehouse
Adding default property:
spark.yarn.dist.files=/etc/hudi/conf/hudi-defaults.conf
Adding default property:
spark.sql.parquet.fs.optimized.committer.optimization-enabled=true
Adding default property: spark.executorEnv.regionShortName=use1
Adding default property:
spark.executor.extraJavaOptions=-Dcom.amazonaws.sdk.disableCbor=true
-Duser.timezone=GMT -verbose:gc -XX:+UseG1GC -XX:+PrintGCDetails
-XX:+PrintGCDateStamps -XX:MetaspaceSize=300M
Adding default property:
spark.history.fs.logDirectory=hdfs:///var/log/spark/apps
Adding default property:
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version.emr_internal_use_only.EmrFileSystem=2
Adding default property:
spark.hadoop.mapreduce.output.fs.optimized.committer.enabled=true
Adding default property: spark.yarn.appMasterEnv.assetId=a206760
Adding default property: spark.sql.autoBroadcastJoinThreshold=104857600
Adding default property: spark.eventLog.enabled=true
Adding default property: spark.shuffle.service.enabled=false
Adding default property:
spark.driver.extraLibraryPath=/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native
Adding default property: spark.emr.default.executor.memory=18971M
Adding default property:
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2
Adding default property: spark.kryoserializer.buffer.max=1024m
Adding default property:
spark.yarn.historyServer.address=ip-100-66-69-75.3175.aws-int.thomsonreuters.com:18080
Adding default property:
spark.stage.attempt.ignoreOnDecommissionFetchFailure=true
Adding default property: spark.yarn.appMasterEnv.regionFullName=us-east-1
Adding default property: spark.yarn.appMasterEnv.regionShortName=use1
Adding default property:
spark.storage.decommission.shuffleBlocks.enabled=true
Adding default property: spark.executorEnv.regionFullName=us-east-1
Adding default property: spark.rpc.askTimeout=480
Adding default property: spark.sql.streaming.metricsEnabled=true
Adding default property: spark.locality.wait=6s
Adding default property: spark.driver.memory=2048M
Adding default property: spark.decommission.enabled=true
Adding default property: spark.files.fetchFailure.unRegisterOutputOnHost=true
Adding default property: spark.executorEnv.assetId=a206760
Adding default property: spark.executor.defaultJavaOptions=-verbose:gc
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:OnOutOfMemoryError='kill -9 %p'
-Dfile.encoding=UTF-8
Adding default property: spark.resourceManager.cleanupExpiredHost=true
Adding default property: spark.yarn.appMasterEnv.SPARK_PUBLIC_DNS=$(hostname
-f)
Adding default property:
spark.sql.emr.internal.extensions=com.amazonaws.emr.spark.EmrSparkSessionExtensions
Adding default property: spark.emr.default.executor.cores=4
Adding default property:
spark.driver.extraJavaOptions=-Dcom.amazonaws.sdk.disableCbor=true
-Duser.timezone=GMT -verbose:gc -XX:+UseG1GC -XX:+PrintGCDetails
-XX:+PrintGCDateStamps -XX:MetaspaceSize=300M
Adding default property:
spark.hadoop.fs.s3.getObject.initialSocketTimeoutMilliseconds=2000
Adding default property: spark.deploy.mode=cluster
Adding default property: spark.master=yarn
Adding default property:
spark.sql.parquet.output.committer.class=com.amazon.emr.committer.EmrOptimizedSparkSqlParquetOutputCommitter
Adding default property: spark.rpc.message.maxSize=416
Adding default property: spark.driver.defaultJavaOptions=-verbose:gc
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -verbose:gc -XX:+PrintGCDetails
-XX:+PrintGCDateStamps -Dfile.encoding=UTF-8
Adding default property:
spark.executorEnv.correlationId=offline_compaction_schedule
Adding default property: spark.blacklist.decommissioning.timeout=1h
Adding default property:
spark.executor.extraLibraryPath=/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native
Adding default property: fs.s3.maxRetries=1000000
Adding default property:
spark.sql.hive.metastore.sharedPrefixes=com.amazonaws.services.dynamodbv2
Adding default property: spark.executor.memory=18971M
Adding default property:
spark.driver.extraClassPath=/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/docker/usr/lib/hadoop-lzo/lib/*:/docker/usr/lib/hadoop/hadoop-aws.jar:/docker/usr/share/aws/aws-java-sdk/*:/docker/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/docker/usr/share/aws/emr/security/conf:/docker/usr/share/aws/emr/security/lib/*:/docker/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/docker/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/docker/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/docker/usr/share/aws/emr/s3select/
lib/emr-s3-select-spark-connector.jar:/usr/lib/aws-sdk-v2/bundle-2.17.282.jar
Adding default property: spark.eventLog.dir=hdfs:///var/log/spark/apps
Adding default property:
spark.executorEnv.bigdataEnv=bigdata_environment:dev,bigdata_project:tacticalnovusingest,bigdata_environment-type:DEVELOPMENT,bigdata_region:us-east-1,bigdata_servicename:tactical-novus-ingest,bigdata_version:dev4856801
Adding default property: spark.dynamicAllocation.enabled=false
Adding default property:
spark.executor.extraClassPath=/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/docker/usr/lib/hadoop-lzo/lib/*:/docker/usr/lib/hadoop/hadoop-aws.jar:/docker/usr/share/aws/aws-java-sdk/*:/docker/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/docker/usr/share/aws/emr/security/conf:/docker/usr/share/aws/emr/security/lib/*:/docker/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/docker/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/docker/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/docker/usr/share/aws/emr/s3selec
t/lib/emr-s3-select-spark-connector.jar:/usr/lib/aws-sdk-v2/bundle-2.17.282.jar
Adding default property: spark.executor.cores=4
Adding default property: spark.history.ui.port=18080
Adding default property: spark.blacklist.decommissioning.enabled=true
Adding default property:
spark.yarn.appMasterEnv.correlationId=offline_compaction_schedule
Adding default property: spark.decommissioning.timeout.threshold=20
Adding default property: spark.yarn.heterogeneousExecutors.enabled=false
Adding default property:
spark.hadoop.mapreduce.fileoutputcommitter.cleanup-failures.ignored.emr_internal_use_only.EmrFileSystem=true
Adding default property: spark.hadoop.yarn.timeline-service.enabled=false
Adding default property: spark.yarn.executor.memoryOverheadFactor=0.1875
Warning: Ignoring non-Spark config property: fs.s3.maxRetries
Parsed arguments:
master yarn
deployMode null
executorMemory 16g
executorCores 10
totalExecutorCores null
propertiesFile /usr/lib/spark/conf/spark-defaults.conf
driverMemory 4g
driverCores 10
driverExtraClassPath
/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/docker/usr/lib/hadoop-lzo/lib/*:/docker/usr/lib/hadoop/hadoop-aws.jar:/docker/usr/share/aws/aws-java-sdk/*:/docker/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/docker/usr/share/aws/emr/security/conf:/docker/usr/share/aws/emr/security/lib/*:/docker/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/docker/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/docker/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/docker/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-con
nector.jar:/usr/lib/aws-sdk-v2/bundle-2.17.282.jar
driverExtraLibraryPath
/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native
driverExtraJavaOptions -Dcom.amazonaws.sdk.disableCbor=true
-Duser.timezone=GMT -verbose:gc -XX:+UseG1GC -XX:+PrintGCDetails
-XX:+PrintGCDateStamps -XX:MetaspaceSize=300M
supervise false
queue null
numExecutors 8
files null
pyFiles null
archives null
mainClass org.apache.hudi.utilities.HoodieCompactor
primaryResource
file:/usr/lib/hudi/hudi-utilities-bundle.jar,/usr/lib/hudi/hudi-spark-bundle.jar
name org.apache.hudi.utilities.HoodieCompactor
childArgs [--table-name novusdoc --base-path
s3://a206760-novusdoc-s3-dev-use1/novusdoc --mode scheduleandexecute
--spark-memory 2g --hoodie-conf hoodie.metadata.enable=false --hoodie-conf
hoodie.compact.inline.trigger.strategy=NUM_COMMITS --hoodie-conf
hoodie.compact.inline.max.delta.commits=5]
jars null
packages
org.apache.hudi:hudi-utilities-bundle_2.12:0.11.1,org.apache.spark:spark-avro_2.11:2.4.4,org.apache.hudi:hudi-spark3-bundle_2.12:0.11.1
packagesExclusions null
repositories null
verbose true
Spark properties used, including those specified through
--conf and those from the properties file
/usr/lib/spark/conf/spark-defaults.conf:
(spark.sql.emr.internal.extensions,com.amazonaws.emr.spark.EmrSparkSessionExtensions)
(spark.executor.defaultJavaOptions,-verbose:gc -XX:+PrintGCDetails
-XX:+PrintGCDateStamps -XX:OnOutOfMemoryError='kill -9 %p'
-Dfile.encoding=UTF-8)
(spark.blacklist.decommissioning.timeout,1h)
(spark.yarn.appMasterEnv.correlationId,offline_compaction_schedule)
(spark.yarn.executor.memoryOverheadFactor,0.1875)
(spark.executorEnv.correlationId,offline_compaction_schedule)
(spark.executorEnv.regionShortName,use1)
(spark.blacklist.decommissioning.enabled,true)
(spark.executor.extraLibraryPath,/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native)
(spark.executorEnv.assetId,a206760)
(spark.hadoop.yarn.timeline-service.enabled,false)
(spark.driver.memory,4g)
(spark.executor.memory,18971M)
(spark.executorEnv.bigdataEnv,bigdata_environment:dev,bigdata_project:tacticalnovusingest,bigdata_environment-type:DEVELOPMENT,bigdata_region:us-east-1,bigdata_servicename:tactical-novus-ingest,bigdata_version:dev4856801)
(spark.sql.parquet.fs.optimized.committer.optimization-enabled,true)
(spark.sql.warehouse.dir,hdfs:///user/spark/warehouse)
(spark.driver.extraLibraryPath,/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native)
(spark.yarn.historyServer.address,ip-100-66-69-75.3175.aws-int.thomsonreuters.com:18080)
(spark.yarn.heterogeneousExecutors.enabled,false)
(spark.rpc.message.maxSize,416)
(spark.eventLog.enabled,true)
(spark.storage.decommission.shuffleBlocks.enabled,true)
(spark.yarn.dist.files,/etc/hudi/conf/hudi-defaults.conf)
(spark.files.fetchFailure.unRegisterOutputOnHost,true)
(spark.history.ui.port,18080)
(spark.stage.attempt.ignoreOnDecommissionFetchFailure,true)
(spark.hadoop.fs.s3.getObject.initialSocketTimeoutMilliseconds,2000)
(spark.yarn.appMasterEnv.SPARK_PUBLIC_DNS,$(hostname -f))
(spark.rpc.askTimeout,480)
(spark.sql.streaming.metricsEnabled,true)
(spark.driver.defaultJavaOptions,-verbose:gc -XX:+PrintGCDetails
-XX:+PrintGCDateStamps -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps
-Dfile.encoding=UTF-8)
(spark.serializer,org.apache.spark.serializer.KryoSerializer)
(spark.executor.extraJavaOptions,-Dcom.amazonaws.sdk.disableCbor=true
-Duser.timezone=GMT -verbose:gc -XX:+UseG1GC -XX:+PrintGCDetails
-XX:+PrintGCDateStamps -XX:MetaspaceSize=300M)
(spark.resourceManager.cleanupExpiredHost,true)
(spark.deploy.mode,cluster)
(spark.history.fs.logDirectory,hdfs:///var/log/spark/apps)
(spark.shuffle.service.enabled,false)
(spark.yarn.appMasterEnv.regionFullName,us-east-1)
(spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version,2)
(spark.locality.wait,6s)
(spark.emr.default.executor.cores,4)
(spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version.emr_internal_use_only.EmrFileSystem,2)
(spark.driver.extraJavaOptions,-Dcom.amazonaws.sdk.disableCbor=true
-Duser.timezone=GMT -verbose:gc -XX:+UseG1GC -XX:+PrintGCDetails
-XX:+PrintGCDateStamps -XX:MetaspaceSize=300M)
(spark.kryoserializer.buffer.max,1024m)
(spark.hadoop.mapreduce.output.fs.optimized.committer.enabled,true)
(spark.yarn.appMasterEnv.regionShortName,use1)
(spark.executor.extraClassPath,/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/docker/usr/lib/hadoop-lzo/lib/*:/docker/usr/lib/hadoop/hadoop-aws.jar:/docker/usr/share/aws/aws-java-sdk/*:/docker/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/docker/usr/share/aws/emr/security/conf:/docker/usr/share/aws/emr/security/lib/*:/docker/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/docker/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/docker/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/docker/usr/share/aws/emr/s3select/lib/emr-s3-select-sp
ark-connector.jar:/usr/lib/aws-sdk-v2/bundle-2.17.282.jar)
(spark.sql.hive.metastore.sharedPrefixes,com.amazonaws.services.dynamodbv2)
(spark.eventLog.dir,hdfs:///var/log/spark/apps)
(spark.executorEnv.regionFullName,us-east-1)
(spark.master,yarn)
(spark.emr.default.executor.memory,18971M)
(spark.decommission.enabled,true)
(spark.dynamicAllocation.enabled,false)
(spark.yarn.appMasterEnv.assetId,a206760)
(spark.sql.autoBroadcastJoinThreshold,104857600)
(spark.sql.parquet.output.committer.class,com.amazon.emr.committer.EmrOptimizedSparkSqlParquetOutputCommitter)
(spark.yarn.appMasterEnv.bigdataEnv,bigdata_environment:dev,bigdata_project:tacticalnovusingest,bigdata_environment-type:DEVELOPMENT,bigdata_region:us-east-1,bigdata_servicename:tactical-novus-ingest,bigdata_version:dev4856801)
(spark.executor.cores,4)
(spark.decommissioning.timeout.threshold,20)
(spark.hadoop.mapreduce.fileoutputcommitter.cleanup-failures.ignored.emr_internal_use_only.EmrFileSystem,true)
(spark.driver.extraClassPath,/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/docker/usr/lib/hadoop-lzo/lib/*:/docker/usr/lib/hadoop/hadoop-aws.jar:/docker/usr/share/aws/aws-java-sdk/*:/docker/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/docker/usr/share/aws/emr/security/conf:/docker/usr/share/aws/emr/security/lib/*:/docker/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/docker/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/docker/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/docker/usr/share/aws/emr/s3select/lib/emr-s3-select-spar
k-connector.jar:/usr/lib/aws-sdk-v2/bundle-2.17.282.jar)
:: loading settings :: url =
jar:file:/usr/lib/spark/jars/ivy-2.5.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
Ivy Default Cache set to: /home/hadoop/.ivy2/cache
The jars for the packages stored in: /home/hadoop/.ivy2/jars
org.apache.hudi#hudi-utilities-bundle_2.12 added as a dependency
org.apache.spark#spark-avro_2.11 added as a dependency
org.apache.hudi#hudi-spark3-bundle_2.12 added as a dependency
:: resolving dependencies ::
org.apache.spark#spark-submit-parent-1341569f-530d-4afe-a08e-cc9ee2167f5c;1.0
confs: [default]
found org.apache.hudi#hudi-utilities-bundle_2.12;0.11.1 in central
found org.apache.htrace#htrace-core;3.1.0-incubating in central
found org.apache.spark#spark-avro_2.11;2.4.4 in central
found org.spark-project.spark#unused;1.0.0 in central
found org.apache.hudi#hudi-spark3-bundle_2.12;0.11.1 in central
:: resolution report :: resolve 257ms :: artifacts dl 13ms
:: modules in use:
org.apache.htrace#htrace-core;3.1.0-incubating from central in
[default]
org.apache.hudi#hudi-spark3-bundle_2.12;0.11.1 from central in
[default]
org.apache.hudi#hudi-utilities-bundle_2.12;0.11.1 from central in
[default]
org.apache.spark#spark-avro_2.11;2.4.4 from central in [default]
org.spark-project.spark#unused;1.0.0 from central in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 5 | 0 | 0 | 0 || 5 | 0 |
---------------------------------------------------------------------
:: retrieving ::
org.apache.spark#spark-submit-parent-1341569f-530d-4afe-a08e-cc9ee2167f5c
confs: [default]
0 artifacts copied, 5 already retrieved (0kB/12ms)
2023-06-19T10:26:48.356+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.util.ShutdownHookManager] [ShutdownHookManager]: Adding
shutdown hook
Main class:
org.apache.hudi.utilities.HoodieCompactor
Arguments:
--table-name
novusdoc
--base-path
s3://a206760-novusdoc-s3-dev-use1/novusdoc
--mode
scheduleandexecute
--spark-memory
2g
--hoodie-conf
hoodie.metadata.enable=false
--hoodie-conf
hoodie.compact.inline.trigger.strategy=NUM_COMMITS
--hoodie-conf
hoodie.compact.inline.max.delta.commits=5
Spark config:
(spark.serializer,org.apache.spark.serializer.KryoSerializer)
(spark.yarn.appMasterEnv.bigdataEnv,bigdata_environment:dev,bigdata_project:tacticalnovusingest,bigdata_environment-type:DEVELOPMENT,bigdata_region:us-east-1,bigdata_servicename:tactical-novus-ingest,bigdata_version:dev4856801)
(spark.sql.warehouse.dir,hdfs:///user/spark/warehouse)
(spark.yarn.dist.files,file:/etc/hudi/conf.dist/hudi-defaults.conf)
(spark.sql.parquet.fs.optimized.committer.optimization-enabled,true)
(spark.executorEnv.regionShortName,use1)
(spark.executor.extraJavaOptions,-Dcom.amazonaws.sdk.disableCbor=true
-Duser.timezone=GMT -verbose:gc -XX:+UseG1GC -XX:+PrintGCDetails
-XX:+PrintGCDateStamps -XX:MetaspaceSize=300M)
(spark.history.fs.logDirectory,hdfs:///var/log/spark/apps)
(spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version.emr_internal_use_only.EmrFileSystem,2)
(spark.hadoop.mapreduce.output.fs.optimized.committer.enabled,true)
(spark.yarn.appMasterEnv.assetId,a206760)
(spark.sql.autoBroadcastJoinThreshold,104857600)
(spark.eventLog.enabled,true)
(spark.shuffle.service.enabled,false)
(spark.driver.extraLibraryPath,/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native)
(spark.emr.default.executor.memory,18971M)
(spark.jars,file:/usr/lib/hudi/hudi-utilities-bundle.jar,file:/usr/lib/hudi/hudi-spark3-bundle_2.12-0.11.0-amzn-0.jar)
(spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version,2)
(spark.kryoserializer.buffer.max,1024m)
(spark.yarn.historyServer.address,ip-100-66-69-75.3175.aws-int.thomsonreuters.com:18080)
(spark.stage.attempt.ignoreOnDecommissionFetchFailure,true)
(spark.yarn.appMasterEnv.regionFullName,us-east-1)
(spark.yarn.appMasterEnv.regionShortName,use1)
(spark.app.name,org.apache.hudi.utilities.HoodieCompactor)
(spark.storage.decommission.shuffleBlocks.enabled,true)
(spark.executorEnv.regionFullName,us-east-1)
(spark.rpc.askTimeout,480)
(spark.sql.streaming.metricsEnabled,true)
(spark.locality.wait,6s)
(spark.driver.memory,4g)
(spark.executor.instances,8)
(spark.decommission.enabled,true)
(spark.files.fetchFailure.unRegisterOutputOnHost,true)
(spark.submit.pyFiles,)
(spark.executorEnv.assetId,a206760)
(spark.executor.defaultJavaOptions,-verbose:gc -XX:+PrintGCDetails
-XX:+PrintGCDateStamps -XX:OnOutOfMemoryError='kill -9 %p'
-Dfile.encoding=UTF-8)
(spark.resourceManager.cleanupExpiredHost,true)
(spark.yarn.appMasterEnv.SPARK_PUBLIC_DNS,$(hostname -f))
(spark.sql.emr.internal.extensions,com.amazonaws.emr.spark.EmrSparkSessionExtensions)
(spark.emr.default.executor.cores,4)
(spark.driver.extraJavaOptions,-Dcom.amazonaws.sdk.disableCbor=true
-Duser.timezone=GMT -verbose:gc -XX:+UseG1GC -XX:+PrintGCDetails
-XX:+PrintGCDateStamps -XX:MetaspaceSize=300M)
(spark.hadoop.fs.s3.getObject.initialSocketTimeoutMilliseconds,2000)
(spark.submit.deployMode,client)
(spark.deploy.mode,cluster)
(spark.master,yarn)
(spark.sql.parquet.output.committer.class,com.amazon.emr.committer.EmrOptimizedSparkSqlParquetOutputCommitter)
(spark.rpc.message.maxSize,416)
(spark.driver.defaultJavaOptions,-verbose:gc -XX:+PrintGCDetails
-XX:+PrintGCDateStamps -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps
-Dfile.encoding=UTF-8)
(spark.executorEnv.correlationId,offline_compaction_schedule)
(spark.blacklist.decommissioning.timeout,1h)
(spark.executor.extraLibraryPath,/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native)
(spark.sql.hive.metastore.sharedPrefixes,com.amazonaws.services.dynamodbv2)
(spark.executor.memory,16g)
(spark.driver.extraClassPath,/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/docker/usr/lib/hadoop-lzo/lib/*:/docker/usr/lib/hadoop/hadoop-aws.jar:/docker/usr/share/aws/aws-java-sdk/*:/docker/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/docker/usr/share/aws/emr/security/conf:/docker/usr/share/aws/emr/security/lib/*:/docker/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/docker/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/docker/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/docker/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-
connector.jar:/usr/lib/aws-sdk-v2/bundle-2.17.282.jar)
(spark.eventLog.dir,hdfs:///var/log/spark/apps)
(spark.executorEnv.bigdataEnv,bigdata_environment:dev,bigdata_project:tacticalnovusingest,bigdata_environment-type:DEVELOPMENT,bigdata_region:us-east-1,bigdata_servicename:tactical-novus-ingest,bigdata_version:dev4856801)
(spark.dynamicAllocation.enabled,false)
(spark.executor.extraClassPath,/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/docker/usr/lib/hadoop-lzo/lib/*:/docker/usr/lib/hadoop/hadoop-aws.jar:/docker/usr/share/aws/aws-java-sdk/*:/docker/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/docker/usr/share/aws/emr/security/conf:/docker/usr/share/aws/emr/security/lib/*:/docker/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/docker/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/docker/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/docker/usr/share/aws/emr/s3select/lib/emr-s3-select-spar
k-connector.jar:/usr/lib/aws-sdk-v2/bundle-2.17.282.jar)
(spark.executor.cores,10)
(spark.history.ui.port,18080)
(spark.repl.local.jars,file:///home/hadoop/.ivy2/jars/org.apache.hudi_hudi-utilities-bundle_2.12-0.11.1.jar,file:///home/hadoop/.ivy2/jars/org.apache.spark_spark-avro_2.11-2.4.4.jar,file:///home/hadoop/.ivy2/jars/org.apache.hudi_hudi-spark3-bundle_2.12-0.11.1.jar,file:///home/hadoop/.ivy2/jars/org.apache.htrace_htrace-core-3.1.0-incubating.jar,file:///home/hadoop/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar)
(spark.blacklist.decommissioning.enabled,true)
(spark.yarn.appMasterEnv.correlationId,offline_compaction_schedule)
(spark.decommissioning.timeout.threshold,20)
(spark.yarn.heterogeneousExecutors.enabled,false)
(spark.hadoop.mapreduce.fileoutputcommitter.cleanup-failures.ignored.emr_internal_use_only.EmrFileSystem,true)
(spark.yarn.dist.jars,file:///home/hadoop/.ivy2/jars/org.apache.hudi_hudi-utilities-bundle_2.12-0.11.1.jar,file:///home/hadoop/.ivy2/jars/org.apache.spark_spark-avro_2.11-2.4.4.jar,file:///home/hadoop/.ivy2/jars/org.apache.hudi_hudi-spark3-bundle_2.12-0.11.1.jar,file:///home/hadoop/.ivy2/jars/org.apache.htrace_htrace-core-3.1.0-incubating.jar,file:///home/hadoop/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar)
(spark.hadoop.yarn.timeline-service.enabled,false)
(spark.yarn.executor.memoryOverheadFactor,0.1875)
Classpath elements:
file:/usr/lib/hudi/hudi-utilities-bundle.jar,/usr/lib/hudi/hudi-spark-bundle.jar
file:///home/hadoop/.ivy2/jars/org.apache.hudi_hudi-utilities-bundle_2.12-0.11.1.jar
file:///home/hadoop/.ivy2/jars/org.apache.spark_spark-avro_2.11-2.4.4.jar
file:///home/hadoop/.ivy2/jars/org.apache.hudi_hudi-spark3-bundle_2.12-0.11.1.jar
file:///home/hadoop/.ivy2/jars/org.apache.htrace_htrace-core-3.1.0-incubating.jar
file:///home/hadoop/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar
2023-06-19T10:26:48.653+0000 [WARN] [offline_compaction_schedule]
[org.apache.spark.util.DependencyUtils] [DependencyUtils]: Local jar
/usr/lib/hudi/hudi-utilities-bundle.jar,/usr/lib/hudi/hudi-spark-bundle.jar
does not exist, skipping.
2023-06-19T10:26:48.759+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.SparkContext] [SparkContext]: Running Spark version
3.2.1-amzn-0
2023-06-19T10:26:48.783+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.resource.ResourceUtils] [ResourceUtils]:
==============================================================
2023-06-19T10:26:48.783+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.resource.ResourceUtils] [ResourceUtils]: No custom resources
configured for spark.driver.
2023-06-19T10:26:48.784+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.resource.ResourceUtils] [ResourceUtils]:
==============================================================
2023-06-19T10:26:48.784+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.SparkContext] [SparkContext]: Submitted application:
compactor-novusdoc
2023-06-19T10:26:48.810+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.resource.ResourceProfile] [ResourceProfile]: Default
ResourceProfile created, executor resources: Map(cores -> name: cores, amount:
10, script: , vendor: , memory -> name: memory, amount: 2048, script: , vendor:
, offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources:
Map(cpus -> name: cpus, amount: 1.0)
2023-06-19T10:26:48.824+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.resource.ResourceProfile] [ResourceProfile]: Limiting
resource is cpus at 10 tasks per executor
2023-06-19T10:26:48.826+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.resource.ResourceProfileManager] [ResourceProfileManager]:
Added ResourceProfile id: 0
2023-06-19T10:26:48.884+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.SecurityManager] [SecurityManager]: Changing view acls to:
hadoop
2023-06-19T10:26:48.884+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.SecurityManager] [SecurityManager]: Changing modify acls to:
hadoop
2023-06-19T10:26:48.884+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.SecurityManager] [SecurityManager]: Changing view acls groups
to:
2023-06-19T10:26:48.885+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.SecurityManager] [SecurityManager]: Changing modify acls
groups to:
2023-06-19T10:26:48.885+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.SecurityManager] [SecurityManager]: SecurityManager:
authentication disabled; ui acls disabled; users with view permissions:
Set(hadoop); groups with view permissions: Set(); users with modify
permissions: Set(hadoop); groups with modify permissions: Set()
2023-06-19T10:26:48.918+0000 [INFO] [offline_compaction_schedule]
[org.apache.hadoop.conf.Configuration.deprecation] [deprecation]:
mapred.output.compression.codec is deprecated. Instead, use
mapreduce.output.fileoutputformat.compress.codec
2023-06-19T10:26:48.918+0000 [INFO] [offline_compaction_schedule]
[org.apache.hadoop.conf.Configuration.deprecation] [deprecation]:
mapred.output.compression.type is deprecated. Instead, use
mapreduce.output.fileoutputformat.compress.type
2023-06-19T10:26:48.919+0000 [INFO] [offline_compaction_schedule]
[org.apache.hadoop.conf.Configuration.deprecation] [deprecation]:
mapred.output.compress is deprecated. Instead, use
mapreduce.output.fileoutputformat.compress
2023-06-19T10:26:49.159+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.network.server.TransportServer] [TransportServer]: Shuffle
server started on port: 35007
2023-06-19T10:26:49.168+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.util.Utils] [Utils]: Successfully started service
'sparkDriver' on port 35007.
2023-06-19T10:26:49.177+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.SparkEnv] [SparkEnv]: Using serializer: class
org.apache.spark.serializer.KryoSerializer
2023-06-19T10:26:49.196+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.SparkEnv] [SparkEnv]: Registering MapOutputTracker
2023-06-19T10:26:49.197+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.MapOutputTrackerMasterEndpoint]
[MapOutputTrackerMasterEndpoint]: init
2023-06-19T10:26:49.235+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.SparkEnv] [SparkEnv]: Registering BlockManagerMaster
2023-06-19T10:26:49.300+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.SparkEnv] [SparkEnv]: Registering BlockManagerMasterHeartbeat
2023-06-19T10:26:49.400+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.SparkEnv] [SparkEnv]: Registering OutputCommitCoordinator
2023-06-19T10:26:49.404+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.subresultcache.SubResultCacheManager]
[SubResultCacheManager]: Sub-result caches config to enable false.
2023-06-19T10:26:49.404+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.subresultcache.SubResultCacheManager]
[SubResultCacheManager]: Sub-result caches are disabled.
2023-06-19T10:26:49.423+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.SecurityManager] [SecurityManager]: Created SSL options for
ui: SSLOptions{enabled=false, port=None, keyStore=None, keyStorePassword=None,
trustStore=None, trustStorePassword=None, protocol=None,
enabledAlgorithms=Set()}
2023-06-19T10:26:49.504+0000 [INFO] [offline_compaction_schedule]
[org.sparkproject.jetty.util.log] [log]: Logging initialized @2813ms to
org.sparkproject.jetty.util.log.Slf4jLog
2023-06-19T10:26:49.581+0000 [INFO] [offline_compaction_schedule]
[org.sparkproject.jetty.server.Server] [Server]: jetty-9.4.43.v20210629; built:
2021-06-30T11:07:22.254Z; git: 526006ecfa3af7f1a27ef3a288e2bef7ea9dd7e8; jvm
1.8.0_372-b07
2023-06-19T10:26:49.606+0000 [INFO] [offline_compaction_schedule]
[org.sparkproject.jetty.server.Server] [Server]: Started @2915ms
2023-06-19T10:26:49.608+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.ui.JettyUtils] [JettyUtils]: Using requestHeaderSize: 8192
2023-06-19T10:26:49.645+0000 [INFO] [offline_compaction_schedule]
[org.sparkproject.jetty.server.AbstractConnector] [AbstractConnector]: Started
ServerConnector@34dc85a{HTTP/1.1, (http/1.1)}{0.0.0.0:8090}
2023-06-19T10:26:49.646+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.util.Utils] [Utils]: Successfully started service 'SparkUI'
on port 8090.
2023-06-19T10:26:49.671+0000 [INFO] [offline_compaction_schedule]
[org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]:
Started o.s.j.s.ServletContextHandler@b8a7e43{/jobs,null,AVAILABLE,@Spark}
2023-06-19T10:26:49.674+0000 [INFO] [offline_compaction_schedule]
[org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]:
Started o.s.j.s.ServletContextHandler@719843e5{/jobs/json,null,AVAILABLE,@Spark}
2023-06-19T10:26:49.675+0000 [INFO] [offline_compaction_schedule]
[org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]:
Started o.s.j.s.ServletContextHandler@58112bc4{/jobs/job,null,AVAILABLE,@Spark}
2023-06-19T10:26:49.676+0000 [INFO] [offline_compaction_schedule]
[org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]:
Started
o.s.j.s.ServletContextHandler@2f5c1332{/jobs/job/json,null,AVAILABLE,@Spark}
2023-06-19T10:26:49.677+0000 [INFO] [offline_compaction_schedule]
[org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]:
Started o.s.j.s.ServletContextHandler@7cab1508{/stages,null,AVAILABLE,@Spark}
2023-06-19T10:26:49.678+0000 [INFO] [offline_compaction_schedule]
[org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]:
Started
o.s.j.s.ServletContextHandler@258ee7de{/stages/json,null,AVAILABLE,@Spark}
2023-06-19T10:26:49.679+0000 [INFO] [offline_compaction_schedule]
[org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]:
Started
o.s.j.s.ServletContextHandler@6d171ce0{/stages/stage,null,AVAILABLE,@Spark}
2023-06-19T10:26:49.680+0000 [INFO] [offline_compaction_schedule]
[org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]:
Started
o.s.j.s.ServletContextHandler@6e1d4137{/stages/stage/json,null,AVAILABLE,@Spark}
2023-06-19T10:26:49.681+0000 [INFO] [offline_compaction_schedule]
[org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]:
Started
o.s.j.s.ServletContextHandler@29a4f594{/stages/pool,null,AVAILABLE,@Spark}
2023-06-19T10:26:49.682+0000 [INFO] [offline_compaction_schedule]
[org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]:
Started
o.s.j.s.ServletContextHandler@5327a06e{/stages/pool/json,null,AVAILABLE,@Spark}
2023-06-19T10:26:49.683+0000 [INFO] [offline_compaction_schedule]
[org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]:
Started o.s.j.s.ServletContextHandler@287f7811{/storage,null,AVAILABLE,@Spark}
2023-06-19T10:26:49.684+0000 [INFO] [offline_compaction_schedule]
[org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]:
Started
o.s.j.s.ServletContextHandler@2b556bb2{/storage/json,null,AVAILABLE,@Spark}
2023-06-19T10:26:49.684+0000 [INFO] [offline_compaction_schedule]
[org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]:
Started
o.s.j.s.ServletContextHandler@17271176{/storage/rdd,null,AVAILABLE,@Spark}
2023-06-19T10:26:49.685+0000 [INFO] [offline_compaction_schedule]
[org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]:
Started
o.s.j.s.ServletContextHandler@2e34384c{/storage/rdd/json,null,AVAILABLE,@Spark}
2023-06-19T10:26:49.686+0000 [INFO] [offline_compaction_schedule]
[org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]:
Started
o.s.j.s.ServletContextHandler@1f52eb6f{/environment,null,AVAILABLE,@Spark}
2023-06-19T10:26:49.687+0000 [INFO] [offline_compaction_schedule]
[org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]:
Started
o.s.j.s.ServletContextHandler@58294867{/environment/json,null,AVAILABLE,@Spark}
2023-06-19T10:26:49.688+0000 [INFO] [offline_compaction_schedule]
[org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]:
Started o.s.j.s.ServletContextHandler@6fc3e1a4{/executors,null,AVAILABLE,@Spark}
2023-06-19T10:26:49.689+0000 [INFO] [offline_compaction_schedule]
[org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]:
Started
o.s.j.s.ServletContextHandler@2d5f7182{/executors/json,null,AVAILABLE,@Spark}
2023-06-19T10:26:49.690+0000 [INFO] [offline_compaction_schedule]
[org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]:
Started
o.s.j.s.ServletContextHandler@29ea78b1{/executors/threadDump,null,AVAILABLE,@Spark}
2023-06-19T10:26:49.691+0000 [INFO] [offline_compaction_schedule]
[org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]:
Started
o.s.j.s.ServletContextHandler@7baf6acf{/executors/threadDump/json,null,AVAILABLE,@Spark}
2023-06-19T10:26:49.701+0000 [INFO] [offline_compaction_schedule]
[org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]:
Started o.s.j.s.ServletContextHandler@7b3315a5{/static,null,AVAILABLE,@Spark}
2023-06-19T10:26:49.702+0000 [INFO] [offline_compaction_schedule]
[org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]:
Started o.s.j.s.ServletContextHandler@629ae7e{/,null,AVAILABLE,@Spark}
2023-06-19T10:26:49.703+0000 [INFO] [offline_compaction_schedule]
[org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]:
Started o.s.j.s.ServletContextHandler@de88ac6{/api,null,AVAILABLE,@Spark}
2023-06-19T10:26:49.704+0000 [INFO] [offline_compaction_schedule]
[org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]:
Started
o.s.j.s.ServletContextHandler@42fcc7e6{/jobs/job/kill,null,AVAILABLE,@Spark}
2023-06-19T10:26:49.705+0000 [INFO] [offline_compaction_schedule]
[org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]:
Started
o.s.j.s.ServletContextHandler@5da7cee2{/stages/stage/kill,null,AVAILABLE,@Spark}
2023-06-19T10:26:49.707+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.ui.SparkUI] [SparkUI]: Bound SparkUI to 0.0.0.0, and started
at http://ip-100-66-69-75.3175.aws-int.thomsonreuters.com:8090
2023-06-19T10:26:49.729+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.SparkContext] [SparkContext]: Added JAR
file:/usr/lib/hudi/hudi-utilities-bundle.jar at
spark://ip-100-66-69-75.3175.aws-int.thomsonreuters.com:35007/jars/hudi-utilities-bundle.jar
with timestamp 1687170408750
2023-06-19T10:26:49.730+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.SparkContext] [SparkContext]: Added JAR
file:/usr/lib/hudi/hudi-spark3-bundle_2.12-0.11.0-amzn-0.jar at
spark://ip-100-66-69-75.3175.aws-int.thomsonreuters.com:35007/jars/hudi-spark3-bundle_2.12-0.11.0-amzn-0.jar
with timestamp 1687170408750
2023-06-19T10:26:49.849+0000: [GC pause (G1 Evacuation Pause) (young),
0.0244707 secs]
[Parallel Time: 11.2 ms, GC Workers: 8]
[GC Worker Start (ms): Min: 3159.6, Avg: 3159.7, Max: 3159.7, Diff:
0.1]
[Ext Root Scanning (ms): Min: 0.7, Avg: 1.5, Max: 4.4, Diff: 3.7, Sum:
11.7]
[Update RS (ms): Min: 0.0, Avg: 0.0, Max: 0.2, Diff: 0.2, Sum: 0.3]
[Processed Buffers: Min: 0, Avg: 1.0, Max: 2, Diff: 2, Sum: 8]
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.3]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.5, Max: 1.3, Diff: 1.3,
Sum: 4.3]
[Object Copy (ms): Min: 6.6, Avg: 8.9, Max: 9.7, Diff: 3.1, Sum: 71.0]
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.4]
[Termination Attempts: Min: 1, Avg: 128.1, Max: 158, Diff: 157,
Sum: 1025]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum:
0.3]
[GC Worker Total (ms): Min: 11.0, Avg: 11.0, Max: 11.1, Diff: 0.1,
Sum: 88.3]
[GC Worker End (ms): Min: 3170.7, Avg: 3170.7, Max: 3170.7, Diff: 0.0]
[Code Root Fixup: 0.1 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.2 ms]
[Other: 13.0 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 12.3 ms]
[Ref Enq: 0.1 ms]
[Redirty Cards: 0.1 ms]
[Humongous Register: 0.0 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 0.3 ms]
[Eden: 292.0M(292.0M)->0.0B(262.0M) Survivors: 5120.0K->35840.0K Heap:
299.2M(496.0M)->37864.7K(496.0M)]
[Times: user=0.09 sys=0.01, real=0.02 secs]
2023-06-19T10:26:49.974+0000 [INFO] [offline_compaction_schedule]
[org.apache.hadoop.yarn.client.RMProxy] [RMProxy]: Connecting to
ResourceManager at
ip-100-66-69-75.3175.aws-int.thomsonreuters.com/100.66.69.75:8032
2023-06-19T10:26:50.132+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]: Requesting a new application
from cluster with 2 NodeManagers
2023-06-19T10:26:50.432+0000 [INFO] [offline_compaction_schedule]
[org.apache.hadoop.conf.Configuration] [Configuration]: resource-types.xml not
found
2023-06-19T10:26:50.432+0000 [INFO] [offline_compaction_schedule]
[org.apache.hadoop.yarn.util.resource.ResourceUtils] [ResourceUtils]: Unable to
find 'resource-types.xml'.
2023-06-19T10:26:50.445+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]: Verifying our application has
not requested more than the maximum memory capability of the cluster (122880 MB
per container)
2023-06-19T10:26:50.445+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]: Will allocate AM container,
with 896 MB memory including 384 MB overhead
2023-06-19T10:26:50.445+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]: Setting up container launch
context for our AM
2023-06-19T10:26:50.446+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]: Setting up the launch
environment for our AM container
2023-06-19T10:26:50.452+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]: Preparing resources for our AM
container
2023-06-19T10:26:50.478+0000 [WARN] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]: Neither spark.yarn.jars nor
spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
2023-06-19T10:26:54.119+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]: Uploading resource
file:/mnt/tmp/spark-94366315-0ad4-4f1a-8051-1c517b83f435/__spark_libs__4987513252404456461.zip
->
hdfs://ip-100-66-69-75.3175.aws-int.thomsonreuters.com:8020/user/hadoop/.sparkStaging/application_1687146322573_0047/__spark_libs__4987513252404456461.zip
2023-06-19T10:26:54.546+0000: [GC pause (G1 Evacuation Pause) (young),
0.0166820 secs]
[Parallel Time: 11.6 ms, GC Workers: 8]
[GC Worker Start (ms): Min: 7856.4, Avg: 7856.7, Max: 7857.8, Diff:
1.4]
[Ext Root Scanning (ms): Min: 0.0, Avg: 1.1, Max: 4.5, Diff: 4.5, Sum:
8.5]
[Update RS (ms): Min: 0.0, Avg: 0.0, Max: 0.2, Diff: 0.2, Sum: 0.3]
[Processed Buffers: Min: 0, Avg: 0.6, Max: 3, Diff: 3, Sum: 5]
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.3]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.7, Max: 1.6, Diff: 1.6,
Sum: 5.3]
[Object Copy (ms): Min: 7.0, Avg: 9.3, Max: 10.5, Diff: 3.5, Sum: 74.6]
[Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.5]
[Termination Attempts: Min: 1, Avg: 154.9, Max: 198, Diff: 197,
Sum: 1239]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum:
0.3]
[GC Worker Total (ms): Min: 10.1, Avg: 11.2, Max: 11.5, Diff: 1.4,
Sum: 89.9]
[GC Worker End (ms): Min: 7867.9, Avg: 7867.9, Max: 7867.9, Diff: 0.1]
[Code Root Fixup: 0.2 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.2 ms]
[Other: 4.7 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 4.1 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.1 ms]
[Humongous Register: 0.0 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 0.3 ms]
[Eden: 262.0M(262.0M)->0.0B(262.0M) Survivors: 35840.0K->35840.0K Heap:
299.0M(496.0M)->37559.0K(496.0M)]
[Times: user=0.09 sys=0.01, real=0.02 secs]
2023-06-19T10:26:55.069+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]: Uploading resource
file:/home/hadoop/.ivy2/jars/org.apache.hudi_hudi-utilities-bundle_2.12-0.11.1.jar
->
hdfs://ip-100-66-69-75.3175.aws-int.thomsonreuters.com:8020/user/hadoop/.sparkStaging/application_1687146322573_0047/org.apache.hudi_hudi-utilities-bundle_2.12-0.11.1.jar
2023-06-19T10:26:55.222+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]: Uploading resource
file:/home/hadoop/.ivy2/jars/org.apache.spark_spark-avro_2.11-2.4.4.jar ->
hdfs://ip-100-66-69-75.3175.aws-int.thomsonreuters.com:8020/user/hadoop/.sparkStaging/application_1687146322573_0047/org.apache.spark_spark-avro_2.11-2.4.4.jar
2023-06-19T10:26:55.238+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]: Uploading resource
file:/home/hadoop/.ivy2/jars/org.apache.hudi_hudi-spark3-bundle_2.12-0.11.1.jar
->
hdfs://ip-100-66-69-75.3175.aws-int.thomsonreuters.com:8020/user/hadoop/.sparkStaging/application_1687146322573_0047/org.apache.hudi_hudi-spark3-bundle_2.12-0.11.1.jar
2023-06-19T10:26:55.239+0000: [GC pause (G1 Evacuation Pause) (young),
0.0122827 secs]
[Parallel Time: 11.0 ms, GC Workers: 8]
[GC Worker Start (ms): Min: 8548.8, Avg: 8548.9, Max: 8548.9, Diff:
0.1]
[Ext Root Scanning (ms): Min: 0.3, Avg: 0.8, Max: 3.8, Diff: 3.5, Sum:
6.3]
[Update RS (ms): Min: 0.0, Avg: 0.0, Max: 0.2, Diff: 0.2, Sum: 0.3]
[Processed Buffers: Min: 0, Avg: 0.4, Max: 1, Diff: 1, Sum: 3]
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.4]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.6, Max: 1.2, Diff: 1.2,
Sum: 4.8]
[Object Copy (ms): Min: 7.0, Avg: 9.3, Max: 10.3, Diff: 3.3, Sum: 74.2]
[Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.4]
[Termination Attempts: Min: 1, Avg: 137.4, Max: 175, Diff: 174,
Sum: 1099]
[GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum:
0.4]
[GC Worker Total (ms): Min: 10.8, Avg: 10.9, Max: 10.9, Diff: 0.1,
Sum: 86.9]
[GC Worker End (ms): Min: 8559.7, Avg: 8559.7, Max: 8559.8, Diff: 0.1]
[Code Root Fixup: 0.1 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.2 ms]
[Other: 1.0 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 0.5 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.1 ms]
[Humongous Register: 0.0 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 0.2 ms]
[Eden: 262.0M(262.0M)->0.0B(280.0M) Survivors: 35840.0K->17408.0K Heap:
298.7M(496.0M)->19127.0K(496.0M)]
[Times: user=0.09 sys=0.00, real=0.01 secs]
2023-06-19T10:26:55.407+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]: Uploading resource
file:/home/hadoop/.ivy2/jars/org.apache.htrace_htrace-core-3.1.0-incubating.jar
->
hdfs://ip-100-66-69-75.3175.aws-int.thomsonreuters.com:8020/user/hadoop/.sparkStaging/application_1687146322573_0047/org.apache.htrace_htrace-core-3.1.0-incubating.jar
2023-06-19T10:26:55.426+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]: Uploading resource
file:/home/hadoop/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar ->
hdfs://ip-100-66-69-75.3175.aws-int.thomsonreuters.com:8020/user/hadoop/.sparkStaging/application_1687146322573_0047/org.spark-project.spark_unused-1.0.0.jar
2023-06-19T10:26:55.438+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]: Uploading resource
file:/etc/hudi/conf.dist/hudi-defaults.conf ->
hdfs://ip-100-66-69-75.3175.aws-int.thomsonreuters.com:8020/user/hadoop/.sparkStaging/application_1687146322573_0047/hudi-defaults.conf
2023-06-19T10:26:55.858+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]: Creating an archive with the
config files for distribution at
/mnt/tmp/spark-94366315-0ad4-4f1a-8051-1c517b83f435/__spark_conf__7322044392243776097.zip.
2023-06-19T10:26:55.946+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]: Uploading resource
file:/mnt/tmp/spark-94366315-0ad4-4f1a-8051-1c517b83f435/__spark_conf__7322044392243776097.zip
->
hdfs://ip-100-66-69-75.3175.aws-int.thomsonreuters.com:8020/user/hadoop/.sparkStaging/application_1687146322573_0047/__spark_conf__.zip
2023-06-19T10:26:56.009+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]:
===============================================================================
2023-06-19T10:26:56.009+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]: YARN AM launch context:
2023-06-19T10:26:56.010+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]: user class: N/A
2023-06-19T10:26:56.010+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]: env:
2023-06-19T10:26:56.011+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]: regionShortName -> use1
2023-06-19T10:26:56.011+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]: CLASSPATH ->
/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/docker/usr/lib/hadoop-lzo/lib/*:/docker/usr/lib/hadoop/hadoop-aws.jar:/docker/usr/share/aws/aws-java-sdk/*:/docker/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/docker/usr/share/aws/emr/security/conf:/docker/usr/share/aws/emr/security/lib/*:/docker/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/docker/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/docker/usr/share/aws/
sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/docker/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/usr/lib/aws-sdk-v2/bundle-2.17.282.jar<CPS>{{PWD}}<CPS>{{PWD}}/__spark_conf__<CPS>{{PWD}}/__spark_libs__/*<CPS>{{PWD}}/__spark_conf__/__hadoop_conf__
2023-06-19T10:26:56.011+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]: correlationId ->
offline_compaction_schedule
2023-06-19T10:26:56.011+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]: SPARK_YARN_STAGING_DIR
->
hdfs://ip-100-66-69-75.3175.aws-int.thomsonreuters.com:8020/user/hadoop/.sparkStaging/application_1687146322573_0047
2023-06-19T10:26:56.011+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]: SPARK_USER -> hadoop
2023-06-19T10:26:56.011+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]: regionFullName ->
us-east-1
2023-06-19T10:26:56.011+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]: bigdataEnv ->
bigdata_environment:dev,bigdata_project:tacticalnovusingest,bigdata_environment-type:DEVELOPMENT,bigdata_region:us-east-1,bigdata_servicename:tactical-novus-ingest,bigdata_version:dev4856801
2023-06-19T10:26:56.011+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]: assetId -> a206760
2023-06-19T10:26:56.011+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]: SPARK_PUBLIC_DNS ->
$(hostname -f)
2023-06-19T10:26:56.012+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]: resources:
2023-06-19T10:26:56.063+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]:
org.apache.hudi_hudi-utilities-bundle_2.12-0.11.1.jar -> resource { scheme:
"hdfs" host: "ip-100-66-69-75.3175.aws-int.thomsonreuters.com" port: 8020 file:
"/user/hadoop/.sparkStaging/application_1687146322573_0047/org.apache.hudi_hudi-utilities-bundle_2.12-0.11.1.jar"
} size: 62863152 timestamp: 1687170415216 type: FILE visibility: PRIVATE
2023-06-19T10:26:56.064+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]:
org.apache.hudi_hudi-spark3-bundle_2.12-0.11.1.jar -> resource { scheme: "hdfs"
host: "ip-100-66-69-75.3175.aws-int.thomsonreuters.com" port: 8020 file:
"/user/hadoop/.sparkStaging/application_1687146322573_0047/org.apache.hudi_hudi-spark3-bundle_2.12-0.11.1.jar"
} size: 61591563 timestamp: 1687170415401 type: FILE visibility: PRIVATE
2023-06-19T10:26:56.064+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]: hudi-defaults.conf ->
resource { scheme: "hdfs" host:
"ip-100-66-69-75.3175.aws-int.thomsonreuters.com" port: 8020 file:
"/user/hadoop/.sparkStaging/application_1687146322573_0047/hudi-defaults.conf"
} size: 1410 timestamp: 1687170415845 type: FILE visibility: PRIVATE
2023-06-19T10:26:56.064+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]: __spark_libs__ ->
resource { scheme: "hdfs" host:
"ip-100-66-69-75.3175.aws-int.thomsonreuters.com" port: 8020 file:
"/user/hadoop/.sparkStaging/application_1687146322573_0047/__spark_libs__4987513252404456461.zip"
} size: 313860902 timestamp: 1687170415000 type: ARCHIVE visibility: PRIVATE
2023-06-19T10:26:56.064+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]: __spark_conf__ ->
resource { scheme: "hdfs" host:
"ip-100-66-69-75.3175.aws-int.thomsonreuters.com" port: 8020 file:
"/user/hadoop/.sparkStaging/application_1687146322573_0047/__spark_conf__.zip"
} size: 304187 timestamp: 1687170415994 type: ARCHIVE visibility: PRIVATE
2023-06-19T10:26:56.065+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]:
org.apache.spark_spark-avro_2.11-2.4.4.jar -> resource { scheme: "hdfs" host:
"ip-100-66-69-75.3175.aws-int.thomsonreuters.com" port: 8020 file:
"/user/hadoop/.sparkStaging/application_1687146322573_0047/org.apache.spark_spark-avro_2.11-2.4.4.jar"
} size: 187318 timestamp: 1687170415232 type: FILE visibility: PRIVATE
2023-06-19T10:26:56.065+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]:
org.apache.htrace_htrace-core-3.1.0-incubating.jar -> resource { scheme: "hdfs"
host: "ip-100-66-69-75.3175.aws-int.thomsonreuters.com" port: 8020 file:
"/user/hadoop/.sparkStaging/application_1687146322573_0047/org.apache.htrace_htrace-core-3.1.0-incubating.jar"
} size: 1475955 timestamp: 1687170415420 type: FILE visibility: PRIVATE
2023-06-19T10:26:56.065+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]:
org.spark-project.spark_unused-1.0.0.jar -> resource { scheme: "hdfs" host:
"ip-100-66-69-75.3175.aws-int.thomsonreuters.com" port: 8020 file:
"/user/hadoop/.sparkStaging/application_1687146322573_0047/org.spark-project.spark_unused-1.0.0.jar"
} size: 2777 timestamp: 1687170415433 type: FILE visibility: PRIVATE
2023-06-19T10:26:56.065+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]: command:
2023-06-19T10:26:56.066+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]: {{JAVA_HOME}}/bin/java
-server -Xmx512m -Djava.io.tmpdir={{PWD}}/tmp
-Dspark.yarn.app.container.log.dir=<LOG_DIR>
org.apache.spark.deploy.yarn.ExecutorLauncher --arg
'ip-100-66-69-75.3175.aws-int.thomsonreuters.com:35007' --properties-file
{{PWD}}/__spark_conf__/__spark_conf__.properties --dist-cache-conf
{{PWD}}/__spark_conf__/__spark_dist_cache__.properties 1> <LOG_DIR>/stdout 2>
<LOG_DIR>/stderr
2023-06-19T10:26:56.066+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]:
===============================================================================
2023-06-19T10:26:56.067+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.SecurityManager] [SecurityManager]: Changing view acls to:
hadoop
2023-06-19T10:26:56.067+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.SecurityManager] [SecurityManager]: Changing modify acls to:
hadoop
2023-06-19T10:26:56.067+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.SecurityManager] [SecurityManager]: Changing view acls groups
to:
2023-06-19T10:26:56.067+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.SecurityManager] [SecurityManager]: Changing modify acls
groups to:
2023-06-19T10:26:56.067+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.SecurityManager] [SecurityManager]: SecurityManager:
authentication disabled; ui acls disabled; users with view permissions:
Set(hadoop); groups with view permissions: Set(); users with modify
permissions: Set(hadoop); groups with modify permissions: Set()
2023-06-19T10:26:56.090+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]: AM resources: Map()
2023-06-19T10:26:56.091+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]: spark.yarn.maxAppAttempts is
not set. Cluster's default value will be used.
2023-06-19T10:26:56.092+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]: Created resource capability for
AM request: <memory:896, max memory:9223372036854775807, vCores:1, max
vCores:2147483647>
2023-06-19T10:26:56.093+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]: Submitting application
application_1687146322573_0047 to ResourceManager
2023-06-19T10:26:56.124+0000 [INFO] [offline_compaction_schedule]
[org.apache.hadoop.yarn.client.api.impl.YarnClientImpl] [YarnClientImpl]:
Submitted application application_1687146322573_0047
2023-06-19T10:26:57.127+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]: Application report for
application_1687146322573_0047 (state: ACCEPTED)
2023-06-19T10:26:57.130+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]:
client token: N/A
diagnostics: AM container is launched, waiting for AM container to
Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1687170416103
final status: UNDEFINED
tracking URL:
http://ip-100-66-69-75.3175.aws-int.thomsonreuters.com:20888/proxy/application_1687146322573_0047/
user: hadoop
2023-06-19T10:26:58.131+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]: Application report for
application_1687146322573_0047 (state: ACCEPTED)
2023-06-19T10:26:58.131+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]:
client token: N/A
diagnostics: AM container is launched, waiting for AM container to
Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1687170416103
final status: UNDEFINED
tracking URL:
http://ip-100-66-69-75.3175.aws-int.thomsonreuters.com:20888/proxy/application_1687146322573_0047/
user: hadoop
2023-06-19T10:26:59.132+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]: Application report for
application_1687146322573_0047 (state: ACCEPTED)
2023-06-19T10:26:59.133+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]:
client token: N/A
diagnostics: AM container is launched, waiting for AM container to
Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1687170416103
final status: UNDEFINED
tracking URL:
http://ip-100-66-69-75.3175.aws-int.thomsonreuters.com:20888/proxy/application_1687146322573_0047/
user: hadoop
2023-06-19T10:27:00.134+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]: Application report for
application_1687146322573_0047 (state: ACCEPTED)
2023-06-19T10:27:00.134+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]:
client token: N/A
diagnostics: AM container is launched, waiting for AM container to
Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1687170416103
final status: UNDEFINED
tracking URL:
http://ip-100-66-69-75.3175.aws-int.thomsonreuters.com:20888/proxy/application_1687146322573_0047/
user: hadoop
2023-06-19T10:27:01.135+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]: Application report for
application_1687146322573_0047 (state: ACCEPTED)
2023-06-19T10:27:01.136+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]:
client token: N/A
diagnostics: AM container is launched, waiting for AM container to
Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1687170416103
final status: UNDEFINED
tracking URL:
http://ip-100-66-69-75.3175.aws-int.thomsonreuters.com:20888/proxy/application_1687146322573_0047/
user: hadoop
2023-06-19T10:27:02.137+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]: Application report for
application_1687146322573_0047 (state: ACCEPTED)
2023-06-19T10:27:02.137+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]:
client token: N/A
diagnostics: AM container is launched, waiting for AM container to
Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1687170416103
final status: UNDEFINED
tracking URL:
http://ip-100-66-69-75.3175.aws-int.thomsonreuters.com:20888/proxy/application_1687146322573_0047/
user: hadoop
2023-06-19T10:27:03.139+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]: Application report for
application_1687146322573_0047 (state: ACCEPTED)
2023-06-19T10:27:03.139+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]:
client token: N/A
diagnostics: AM container is launched, waiting for AM container to
Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1687170416103
final status: UNDEFINED
tracking URL:
http://ip-100-66-69-75.3175.aws-int.thomsonreuters.com:20888/proxy/application_1687146322573_0047/
user: hadoop
2023-06-19T10:27:04.140+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]: Application report for
application_1687146322573_0047 (state: ACCEPTED)
2023-06-19T10:27:04.140+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]:
client token: N/A
diagnostics: AM container is launched, waiting for AM container to
Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1687170416103
final status: UNDEFINED
tracking URL:
http://ip-100-66-69-75.3175.aws-int.thomsonreuters.com:20888/proxy/application_1687146322573_0047/
user: hadoop
2023-06-19T10:27:05.142+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]: Application report for
application_1687146322573_0047 (state: ACCEPTED)
2023-06-19T10:27:05.142+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]:
client token: N/A
diagnostics: AM container is launched, waiting for AM container to
Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1687170416103
final status: UNDEFINED
tracking URL:
http://ip-100-66-69-75.3175.aws-int.thomsonreuters.com:20888/proxy/application_1687146322573_0047/
user: hadoop
2023-06-19T10:27:05.989+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.network.server.TransportServer] [TransportServer]: New
connection accepted for remote address /100.66.95.167:57800.
2023-06-19T10:27:06.143+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]: Application report for
application_1687146322573_0047 (state: RUNNING)
2023-06-19T10:27:06.143+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.deploy.yarn.Client] [Client]:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 100.66.95.167
ApplicationMaster RPC port: -1
queue: default
start time: 1687170416103
final status: UNDEFINED
tracking URL:
http://ip-100-66-69-75.3175.aws-int.thomsonreuters.com:20888/proxy/application_1687146322573_0047/
user: hadoop
2023-06-19T10:27:06.152+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.network.server.TransportServer] [TransportServer]: Shuffle
server started on port: 32849
2023-06-19T10:27:06.152+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.util.Utils] [Utils]: Successfully started service
'org.apache.spark.network.netty.NettyBlockTransferService' on port 32849.
2023-06-19T10:27:06.152+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.network.netty.NettyBlockTransferService]
[NettyBlockTransferService]: Server created on
ip-100-66-69-75.3175.aws-int.thomsonreuters.com:32849
2023-06-19T10:27:06.301+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.ui.ServerInfo] [ServerInfo]: Adding filter to /metrics/json:
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
2023-06-19T10:27:06.303+0000 [INFO] [offline_compaction_schedule]
[org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]:
Started
o.s.j.s.ServletContextHandler@5c134052{/metrics/json,null,AVAILABLE,@Spark}
2023-06-19T10:27:06.323+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.deploy.history.SingleEventLogFileWriter]
[SingleEventLogFileWriter]: Logging events to
hdfs:/var/log/spark/apps/application_1687146322573_0047.inprogress
2023-06-19T10:27:06.519+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.SparkContext] [SparkContext]: Adding shutdown hook
2023-06-19T10:27:06.553+0000 [INFO] [offline_compaction_schedule]
[org.apache.hudi.common.table.HoodieTableMetaClient] [HoodieTableMetaClient]:
Loading HoodieTableMetaClient from s3://a206760-novusdoc-s3-dev-use1/novusdoc
2023-06-19T10:27:06.772+0000: [GC pause (G1 Evacuation Pause) (young),
0.0194145 secs]
[Parallel Time: 13.6 ms, GC Workers: 8]
[GC Worker Start (ms): Min: 20082.0, Avg: 20082.1, Max: 20082.2, Diff:
0.1]
[Ext Root Scanning (ms): Min: 0.5, Avg: 1.1, Max: 5.2, Diff: 4.7, Sum:
9.0]
[Update RS (ms): Min: 0.0, Avg: 0.0, Max: 0.2, Diff: 0.2, Sum: 0.3]
[Processed Buffers: Min: 0, Avg: 0.4, Max: 1, Diff: 1, Sum: 3]
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.3]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.7, Max: 1.5, Diff: 1.5,
Sum: 5.3]
[Object Copy (ms): Min: 8.3, Avg: 11.5, Max: 12.8, Diff: 4.4, Sum:
91.9]
[Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.8]
[Termination Attempts: Min: 1, Avg: 243.8, Max: 313, Diff: 312,
Sum: 1950]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum:
0.4]
[GC Worker Total (ms): Min: 13.4, Avg: 13.5, Max: 13.6, Diff: 0.2,
Sum: 108.1]
[GC Worker End (ms): Min: 20095.6, Avg: 20095.6, Max: 20095.6, Diff:
0.1]
[Code Root Fixup: 0.2 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.2 ms]
[Other: 5.4 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 4.8 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.1 ms]
[Humongous Register: 0.0 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 0.3 ms]
[Eden: 280.0M(280.0M)->0.0B(262.0M) Survivors: 17408.0K->35840.0K Heap:
298.7M(496.0M)->37559.0K(496.0M)]
[Times: user=0.10 sys=0.00, real=0.02 secs]
2023-06-19T10:27:07.272+0000 [INFO] [offline_compaction_schedule]
[com.amazon.ws.emr.hadoop.fs.util.ClientConfigurationFactory]
[ClientConfigurationFactory]: Set initial getObject socket timeout to 2000 ms.
2023-06-19T10:27:07.546+0000: [GC pause (G1 Evacuation Pause) (young),
0.0190510 secs]
[Parallel Time: 15.1 ms, GC Workers: 8]
[GC Worker Start (ms): Min: 20856.5, Avg: 20856.5, Max: 20856.6, Diff:
0.1]
[Ext Root Scanning (ms): Min: 0.3, Avg: 1.1, Max: 5.7, Diff: 5.4, Sum:
9.1]
[Update RS (ms): Min: 0.0, Avg: 0.0, Max: 0.2, Diff: 0.2, Sum: 0.3]
[Processed Buffers: Min: 0, Avg: 0.4, Max: 1, Diff: 1, Sum: 3]
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, Sum: 0.3]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.9, Max: 2.1, Diff: 2.1,
Sum: 7.6]
[Object Copy (ms): Min: 9.2, Avg: 12.7, Max: 14.2, Diff: 5.0, Sum:
101.5]
[Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.6]
[Termination Attempts: Min: 1, Avg: 191.4, Max: 236, Diff: 235,
Sum: 1531]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum:
0.4]
[GC Worker Total (ms): Min: 14.9, Avg: 15.0, Max: 15.0, Diff: 0.1,
Sum: 119.7]
[GC Worker End (ms): Min: 20871.5, Avg: 20871.5, Max: 20871.5, Diff:
0.1]
[Code Root Fixup: 0.2 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.2 ms]
[Other: 3.6 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 3.0 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.1 ms]
[Humongous Register: 0.0 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 0.3 ms]
[Eden: 262.0M(262.0M)->0.0B(266.0M) Survivors: 35840.0K->31744.0K Heap:
298.7M(496.0M)->33975.0K(496.0M)]
[Times: user=0.12 sys=0.00, real=0.02 secs]
2023-06-19T10:27:08.214+0000 [INFO] [offline_compaction_schedule]
[org.apache.hudi.common.table.HoodieTableConfig] [HoodieTableConfig]: Loading
table properties from
s3://a206760-novusdoc-s3-dev-use1/novusdoc/.hoodie/hoodie.properties
2023-06-19T10:27:08.231+0000 [INFO] [offline_compaction_schedule]
[com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem] [S3NativeFileSystem]:
Opening 's3://a206760-novusdoc-s3-dev-use1/novusdoc/.hoodie/hoodie.properties'
for reading
2023-06-19T10:27:08.367+0000 [INFO] [offline_compaction_schedule]
[org.apache.hudi.common.table.HoodieTableMetaClient] [HoodieTableMetaClient]:
Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET)
from s3://a206760-novusdoc-s3-dev-use1/novusdoc
2023-06-19T10:27:08.367+0000 [INFO] [offline_compaction_schedule]
[org.apache.hudi.common.table.HoodieTableMetaClient] [HoodieTableMetaClient]:
Loading Active commit timeline for s3://a206760-novusdoc-s3-dev-use1/novusdoc
2023-06-19T10:27:08.460+0000 [INFO] [offline_compaction_schedule]
[org.apache.hudi.common.table.timeline.HoodieActiveTimeline]
[HoodieActiveTimeline]: Loaded instants upto :
Option{val=[20230619102516597__deltacommit__COMPLETED]}
2023-06-19T10:27:08.473+0000 [INFO] [offline_compaction_schedule]
[org.apache.hudi.utilities.HoodieCompactor] [HoodieCompactor]:
HoodieCompactorConfig {
--base-path s3://a206760-novusdoc-s3-dev-use1/novusdoc,
--table-name novusdoc,
--instant-time null,
--parallelism 200,
--schema-file null,
--spark-master null,
--spark-memory 2g,
--retry 0,
--schedule false,
--mode scheduleandexecute,
--strategy
org.apache.hudi.table.action.compact.strategy.LogFileSizeBasedCompactionStrategy,
--props null,
--hoodie-conf [hoodie.metadata.enable=false,
hoodie.compact.inline.trigger.strategy=NUM_COMMITS,
hoodie.compact.inline.max.delta.commits=5]
}
2023-06-19T10:27:08.474+0000 [INFO] [offline_compaction_schedule]
[org.apache.hudi.utilities.HoodieCompactor] [HoodieCompactor]: Running Mode:
[scheduleandexecute]
2023-06-19T10:27:08.474+0000 [INFO] [offline_compaction_schedule]
[org.apache.hudi.utilities.HoodieCompactor] [HoodieCompactor]: Step 1: Do
schedule
2023-06-19T10:27:08.651+0000 [INFO] [offline_compaction_schedule]
[org.apache.hudi.client.embedded.EmbeddedTimelineService]
[EmbeddedTimelineService]: Starting Timeline service !!
2023-06-19T10:27:08.652+0000 [INFO] [offline_compaction_schedule]
[org.apache.hudi.client.embedded.EmbeddedTimelineService]
[EmbeddedTimelineService]: Overriding hostIp to
(ip-100-66-69-75.3175.aws-int.thomsonreuters.com) found in spark-conf. It was
null
2023-06-19T10:27:08.661+0000 [INFO] [offline_compaction_schedule]
[org.apache.hudi.common.table.view.FileSystemViewManager]
[FileSystemViewManager]: Creating View Manager with storage type :MEMORY
2023-06-19T10:27:08.661+0000 [INFO] [offline_compaction_schedule]
[org.apache.hudi.common.table.view.FileSystemViewManager]
[FileSystemViewManager]: Creating in-memory based Table View
2023-06-19T10:27:08.671+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.hudi.org.eclipse.jetty.util.log] [log]: Logging to
org.slf4j.impl.Log4jLoggerAdapter(org.apache.hudi.org.eclipse.jetty.util.log)
via org.apache.hudi.org.eclipse.jetty.util.log.Slf4jLog
2023-06-19T10:27:08.672+0000 [INFO] [offline_compaction_schedule]
[org.apache.hudi.org.eclipse.jetty.util.log] [log]: Logging initialized
@21982ms to org.apache.hudi.org.eclipse.jetty.util.log.Slf4jLog
2023-06-19T10:27:08.723+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.hudi.timeline.service.handlers.MarkerHandler] [MarkerHandler]:
MarkerHandler FileSystem: s3
2023-06-19T10:27:08.723+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.hudi.timeline.service.handlers.MarkerHandler] [MarkerHandler]:
MarkerHandler batching params: batchNumThreads=20 batchIntervalMs=50ms
2023-06-19T10:27:08.767+0000: [GC pause (G1 Evacuation Pause) (young),
0.0285504 secs]
[Parallel Time: 21.1 ms, GC Workers: 8]
[GC Worker Start (ms): Min: 22077.6, Avg: 22077.7, Max: 22078.6, Diff:
1.0]
[Ext Root Scanning (ms): Min: 0.2, Avg: 1.7, Max: 7.8, Diff: 7.6, Sum:
13.3]
[Update RS (ms): Min: 0.0, Avg: 0.0, Max: 0.2, Diff: 0.2, Sum: 0.3]
[Processed Buffers: Min: 0, Avg: 0.4, Max: 1, Diff: 1, Sum: 3]
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.3]
[Code Root Scanning (ms): Min: 0.0, Avg: 1.0, Max: 2.2, Diff: 2.2,
Sum: 7.9]
[Object Copy (ms): Min: 13.1, Avg: 18.0, Max: 19.9, Diff: 6.8, Sum:
143.7]
[Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.8]
[Termination Attempts: Min: 1, Avg: 229.8, Max: 299, Diff: 298,
Sum: 1838]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum:
0.3]
[GC Worker Total (ms): Min: 20.0, Avg: 20.8, Max: 21.0, Diff: 1.0,
Sum: 166.5]
[GC Worker End (ms): Min: 22098.5, Avg: 22098.6, Max: 22098.6, Diff:
0.0]
[Code Root Fixup: 0.3 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.2 ms]
[Other: 7.0 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 6.3 ms]
[Ref Enq: 0.1 ms]
[Redirty Cards: 0.1 ms]
[Humongous Register: 0.0 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 0.3 ms]
[Eden: 266.0M(266.0M)->0.0B(259.0M) Survivors: 31744.0K->38912.0K Heap:
299.2M(496.0M)->44866.0K(496.0M)]
[Times: user=0.16 sys=0.01, real=0.04 secs]
2023-06-19T10:27:08.818+0000 [INFO] [offline_compaction_schedule]
[io.javalin.Javalin] [Javalin]:
__ __ _
/ /____ _ _ __ ____ _ / /(_)____
__ / // __ `/| | / // __ `// // // __ \
/ /_/ // /_/ / | |/ // /_/ // // // / / /
\____/ \__,_/ |___/ \__,_//_//_//_/ /_/
https://javalin.io/documentation
2023-06-19T10:27:08.819+0000 [INFO] [offline_compaction_schedule]
[io.javalin.Javalin] [Javalin]: Starting Javalin ...
2023-06-19T10:27:08.957+0000 [INFO] [offline_compaction_schedule]
[io.javalin.Javalin] [Javalin]: Listening on http://localhost:42997/
2023-06-19T10:27:08.957+0000 [INFO] [offline_compaction_schedule]
[io.javalin.Javalin] [Javalin]: Javalin started in 142ms \o/
2023-06-19T10:27:08.957+0000 [INFO] [offline_compaction_schedule]
[org.apache.hudi.timeline.service.TimelineService] [TimelineService]: Starting
Timeline server on port :42997
2023-06-19T10:27:08.957+0000 [INFO] [offline_compaction_schedule]
[org.apache.hudi.client.embedded.EmbeddedTimelineService]
[EmbeddedTimelineService]: Started embedded timeline server at
ip-100-66-69-75.3175.aws-int.thomsonreuters.com:42997
2023-06-19T10:27:08.970+0000 [WARN] [offline_compaction_schedule]
[org.apache.hudi.utilities.HoodieCompactor] [HoodieCompactor]: No instant time
is provided for scheduling compaction.
2023-06-19T10:27:08.973+0000 [INFO] [offline_compaction_schedule]
[org.apache.hudi.client.BaseHoodieWriteClient] [BaseHoodieWriteClient]:
Scheduling table service COMPACT
2023-06-19T10:27:08.974+0000 [INFO] [offline_compaction_schedule]
[org.apache.hudi.client.BaseHoodieWriteClient] [BaseHoodieWriteClient]:
Scheduling compaction at instant time :20230619102708972
2023-06-19T10:27:08.978+0000 [INFO] [offline_compaction_schedule]
[org.apache.hudi.common.table.HoodieTableMetaClient] [HoodieTableMetaClient]:
Loading HoodieTableMetaClient from s3://a206760-novusdoc-s3-dev-use1/novusdoc
2023-06-19T10:27:08.990+0000 [INFO] [offline_compaction_schedule]
[org.apache.hudi.common.table.HoodieTableConfig] [HoodieTableConfig]: Loading
table properties from
s3://a206760-novusdoc-s3-dev-use1/novusdoc/.hoodie/hoodie.properties
2023-06-19T10:27:08.990+0000 [INFO] [offline_compaction_schedule]
[com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem] [S3NativeFileSystem]:
Opening 's3://a206760-novusdoc-s3-dev-use1/novusdoc/.hoodie/hoodie.properties'
for reading
2023-06-19T10:27:09.067+0000 [INFO] [offline_compaction_schedule]
[org.apache.hudi.common.table.HoodieTableMetaClient] [HoodieTableMetaClient]:
Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET)
from s3://a206760-novusdoc-s3-dev-use1/novusdoc
2023-06-19T10:27:09.068+0000 [INFO] [offline_compaction_schedule]
[org.apache.hudi.common.table.HoodieTableMetaClient] [HoodieTableMetaClient]:
Loading Active commit timeline for s3://a206760-novusdoc-s3-dev-use1/novusdoc
2023-06-19T10:27:09.070+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.SecurityManager] [SecurityManager]: user=dr.who
aclsEnabled=false viewAcls=hadoop viewAclsGroups=
2023-06-19T10:27:09.113+0000 [INFO] [offline_compaction_schedule]
[org.apache.hudi.common.table.timeline.HoodieActiveTimeline]
[HoodieActiveTimeline]: Loaded instants upto :
Option{val=[20230619102516597__deltacommit__COMPLETED]}
2023-06-19T10:27:09.121+0000 [INFO] [offline_compaction_schedule]
[org.apache.hudi.common.table.view.FileSystemViewManager]
[FileSystemViewManager]: Creating View Manager with storage type :REMOTE_FIRST
2023-06-19T10:27:09.121+0000 [INFO] [offline_compaction_schedule]
[org.apache.hudi.common.table.view.FileSystemViewManager]
[FileSystemViewManager]: Creating remote first table view
2023-06-19T10:27:09.128+0000 [INFO] [offline_compaction_schedule]
[org.apache.hudi.table.action.compact.ScheduleCompactionActionExecutor]
[ScheduleCompactionActionExecutor]: Checking if compaction needs to be run on
s3://a206760-novusdoc-s3-dev-use1/novusdoc
2023-06-19T10:27:09.137+0000 [DEBUG] [offline_compaction_schedule]
[org.apache.spark.SecurityManager] [SecurityManager]: user=dr.who
aclsEnabled=false viewAcls=hadoop viewAclsGroups=
2023-06-19T10:27:09.184+0000 [INFO] [offline_compaction_schedule]
[org.apache.hudi.client.BaseHoodieClient] [BaseHoodieClient]: Stopping Timeline
service !!
2023-06-19T10:27:09.184+0000 [INFO] [offline_compaction_schedule]
[org.apache.hudi.client.embedded.EmbeddedTimelineService]
[EmbeddedTimelineService]: Closing Timeline server
2023-06-19T10:27:09.184+0000 [INFO] [offline_compaction_schedule]
[org.apache.hudi.timeline.service.TimelineService] [TimelineService]: Closing
Timeline Service
2023-06-19T10:27:09.184+0000 [INFO] [offline_compaction_schedule]
[io.javalin.Javalin] [Javalin]: Stopping Javalin ...
2023-06-19T10:27:09.195+0000 [INFO] [offline_compaction_schedule]
[io.javalin.Javalin] [Javalin]: Javalin has stopped
2023-06-19T10:27:09.195+0000 [INFO] [offline_compaction_schedule]
[org.apache.hudi.timeline.service.TimelineService] [TimelineService]: Closed
Timeline Service
2023-06-19T10:27:09.195+0000 [INFO] [offline_compaction_schedule]
[org.apache.hudi.client.embedded.EmbeddedTimelineService]
[EmbeddedTimelineService]: Closed Timeline server
2023-06-19T10:27:09.196+0000 [WARN] [offline_compaction_schedule]
[org.apache.hudi.utilities.HoodieCompactor] [HoodieCompactor]: Couldn't do
schedule
2023-06-19T10:27:09.211+0000 [INFO] [offline_compaction_schedule]
[org.sparkproject.jetty.server.AbstractConnector] [AbstractConnector]: Stopped
Spark@34dc85a{HTTP/1.1, (http/1.1)}{0.0.0.0:8090}
2023-06-19T10:27:09.238+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.ui.SparkUI] [SparkUI]: Stopped Spark web UI at
http://ip-100-66-69-75.3175.aws-int.thomsonreuters.com:8090
2023-06-19T10:27:09.708+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.MapOutputTrackerMasterEndpoint]
[MapOutputTrackerMasterEndpoint]: MapOutputTrackerMasterEndpoint stopped!
2023-06-19T10:27:09.749+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.SparkContext] [SparkContext]: Successfully stopped
SparkContext
2023-06-19T10:27:09.751+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.util.ShutdownHookManager] [ShutdownHookManager]: Shutdown
hook called
2023-06-19T10:27:09.751+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.util.ShutdownHookManager] [ShutdownHookManager]: Deleting
directory /mnt/tmp/spark-94366315-0ad4-4f1a-8051-1c517b83f435
2023-06-19T10:27:09.756+0000 [INFO] [offline_compaction_schedule]
[org.apache.spark.util.ShutdownHookManager] [ShutdownHookManager]: Deleting
directory /mnt/tmp/spark-f72ca80c-54af-4f64-bcaa-176fe9cc27e4
Heap
garbage-first heap total 507904K, used 192322K [0x00000006c0000000,
0x00000006c0100f80, 0x00000007c0000000)
region size 1024K, 183 young (187392K), 38 survivors (38912K)
Metaspace used 102404K, capacity 108290K, committed 108544K, reserved
1144832K
class space used 13406K, capacity 14036K, committed 14080K, reserved
1048576K
[hadoop@ip-100-66-69-75 a206760-PowerUser2
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]