devanshguptatrepp commented on issue #8777: URL: https://github.com/apache/hudi/issues/8777#issuecomment-1557221338
`SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/lib/tez/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 23/05/22 13:07:37 INFO TreppClient$: creating spark context and parameter loadingcom.trepp.TreppClient$ 23/05/22 13:07:37 INFO ApplicationFactory: Application has been found with name dataloadfor classcom.trepp.dataload.EtlImpl@36bed37a 23/05/22 13:07:38 INFO HiveConf: Found configuration file file:/etc/spark/conf.dist/hive-site.xml 23/05/22 13:07:38 INFO SparkContext: Running Spark version 3.2.1-amzn-0 23/05/22 13:07:38 INFO ResourceUtils: ============================================================== 23/05/22 13:07:38 INFO ResourceUtils: No custom resources configured for spark.driver. 23/05/22 13:07:38 INFO ResourceUtils: ============================================================== 23/05/22 13:07:38 INFO SparkContext: Submitted application: com.trepp.TreppClient 23/05/22 13:07:38 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 4, script: , vendor: , memory -> name: memory, amount: 6144, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0) 23/05/22 13:07:38 INFO ResourceProfile: Limiting resource is cpus at 4 tasks per executor 23/05/22 13:07:38 INFO ResourceProfileManager: Added ResourceProfile id: 0 23/05/22 13:07:38 INFO SecurityManager: Changing view acls to: hadoop 23/05/22 13:07:38 INFO SecurityManager: Changing modify acls to: hadoop 23/05/22 13:07:38 INFO SecurityManager: Changing view acls groups to: 23/05/22 13:07:38 INFO SecurityManager: Changing modify acls groups to: 23/05/22 13:07:38 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); groups with view permissions: Set(); users with modify permissions: Set(hadoop); groups with modify permissions: Set() 23/05/22 13:07:39 INFO Utils: Successfully started service 'sparkDriver' on port 40847. 23/05/22 13:07:39 INFO SparkEnv: Registering MapOutputTracker 23/05/22 13:07:39 INFO SparkEnv: Registering BlockManagerMaster 23/05/22 13:07:39 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 23/05/22 13:07:39 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 23/05/22 13:07:39 INFO SparkEnv: Registering BlockManagerMasterHeartbeat 23/05/22 13:07:39 INFO DiskBlockManager: Created local directory at /mnt/tmp/blockmgr-0dbb342c-968b-492d-8427-5fecf7411ac7 23/05/22 13:07:39 INFO MemoryStore: MemoryStore started with capacity 3.0 GiB 23/05/22 13:07:39 INFO SparkEnv: Registering OutputCommitCoordinator 23/05/22 13:07:39 INFO SubResultCacheManager: Sub-result caches are disabled. 23/05/22 13:07:39 INFO log: Logging initialized @19094ms to org.sparkproject.jetty.util.log.Slf4jLog 23/05/22 13:07:39 INFO Server: jetty-9.4.43.v20210629; built: 2021-06-30T11:07:22.254Z; git: 526006ecfa3af7f1a27ef3a288e2bef7ea9dd7e8; jvm 1.8.0_372-b07 23/05/22 13:07:39 INFO Server: Started @19207ms 23/05/22 13:07:39 INFO AbstractConnector: Started ServerConnector@53f7a906{HTTP/1.1, (http/1.1)}{0.0.0.0:4040} 23/05/22 13:07:39 INFO Utils: Successfully started service 'SparkUI' on port 4040. 23/05/22 13:07:39 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@184751f3{/jobs,null,AVAILABLE,@Spark} 23/05/22 13:07:39 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@2cd3fc29{/jobs/json,null,AVAILABLE,@Spark} 23/05/22 13:07:39 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@3513d214{/jobs/job,null,AVAILABLE,@Spark} 23/05/22 13:07:39 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@46b5f061{/jobs/job/json,null,AVAILABLE,@Spark} 23/05/22 13:07:39 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@108b121f{/stages,null,AVAILABLE,@Spark} 23/05/22 13:07:39 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@2ff498b0{/stages/json,null,AVAILABLE,@Spark} 23/05/22 13:07:39 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@4300e240{/stages/stage,null,AVAILABLE,@Spark} 23/05/22 13:07:39 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@37a67cf{/stages/stage/json,null,AVAILABLE,@Spark} 23/05/22 13:07:39 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@5908e6d6{/stages/pool,null,AVAILABLE,@Spark} 23/05/22 13:07:39 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@2a6fb62f{/stages/pool/json,null,AVAILABLE,@Spark} 23/05/22 13:07:39 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@7b44bfb8{/storage,null,AVAILABLE,@Spark} 23/05/22 13:07:39 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@98637a2{/storage/json,null,AVAILABLE,@Spark} 23/05/22 13:07:39 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@141aba65{/storage/rdd,null,AVAILABLE,@Spark} 23/05/22 13:07:39 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@b55f5b7{/storage/rdd/json,null,AVAILABLE,@Spark} 23/05/22 13:07:39 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@6b2ef50e{/environment,null,AVAILABLE,@Spark} 23/05/22 13:07:39 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@4b5ad306{/environment/json,null,AVAILABLE,@Spark} 23/05/22 13:07:39 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@48a46b0f{/executors,null,AVAILABLE,@Spark} 23/05/22 13:07:39 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@9f9146d{/executors/json,null,AVAILABLE,@Spark} 23/05/22 13:07:39 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@45e7bb79{/executors/threadDump,null,AVAILABLE,@Spark} 23/05/22 13:07:39 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@21c75084{/executors/threadDump/json,null,AVAILABLE,@Spark} 23/05/22 13:07:39 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@75527e36{/static,null,AVAILABLE,@Spark} 23/05/22 13:07:39 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@be6d228{/,null,AVAILABLE,@Spark} 23/05/22 13:07:39 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@7eee6c13{/api,null,AVAILABLE,@Spark} 23/05/22 13:07:39 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@2ae5bd34{/jobs/job/kill,null,AVAILABLE,@Spark} 23/05/22 13:07:39 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@3a16984c{/stages/stage/kill,null,AVAILABLE,@Spark} 23/05/22 13:07:39 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://ip-10-73-103-194.ec2.internal:4040 23/05/22 13:07:39 INFO SparkContext: Added JAR s3://treppsamplebucket/mdm/etl-2.0-SNAPSHOT-jar-with-dependencies.jar at s3://treppsamplebucket/mdm/etl-2.0-SNAPSHOT-jar-with-dependencies.jar with timestamp 1684760858447 23/05/22 13:07:39 WARN FairSchedulableBuilder: Fair Scheduler configuration file not found so jobs will be scheduled in FIFO order. To use fair scheduling, configure pools in fairscheduler.xml or set spark.scheduler.allocation.file to a file that contains the configuration. 23/05/22 13:07:39 INFO FairSchedulableBuilder: Created default pool: default, schedulingMode: FIFO, minShare: 0, weight: 1 23/05/22 13:07:40 INFO Utils: Using 50 preallocated executors (minExecutors: 0). Set spark.dynamicAllocation.preallocateExecutors to `false` disable executor preallocation. 23/05/22 13:07:40 INFO RMProxy: Connecting to ResourceManager at ip-10-73-103-194.ec2.internal/10.73.103.194:8032 23/05/22 13:07:40 INFO Client: Requesting a new application from cluster with 6 NodeManagers 23/05/22 13:07:40 INFO Configuration: resource-types.xml not found 23/05/22 13:07:40 INFO ResourceUtils: Unable to find 'resource-types.xml'. 23/05/22 13:07:40 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (11712 MB per container) 23/05/22 13:07:40 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 23/05/22 13:07:40 INFO Client: Setting up container launch context for our AM 23/05/22 13:07:40 INFO Client: Setting up the launch environment for our AM container 23/05/22 13:07:40 INFO Client: Preparing resources for our AM container 23/05/22 13:07:40 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME. 23/05/22 13:07:56 INFO Client: Uploading resource file:/mnt/tmp/spark-a2bd79dd-2460-4f21-aa8a-9e30bd7e24dc/__spark_libs__7560258025190807014.zip -> hdfs://ip-10-73-103-194.ec2.internal:8020/user/hadoop/.sparkStaging/application_1684760745521_0001/__spark_libs__7560258025190807014.zip 23/05/22 13:07:57 INFO Client: Uploading resource s3a://trepp-developmentservices-lake-workspace/binaries/hudi/etl/hudi-spark-bundle_2.12-0.11.0.jar -> hdfs://ip-10-73-103-194.ec2.internal:8020/user/hadoop/.sparkStaging/application_1684760745521_0001/hudi-spark-bundle_2.12-0.11.0.jar 23/05/22 13:07:58 INFO Client: Uploading resource file:/etc/spark/conf.dist/hive-site.xml -> hdfs://ip-10-73-103-194.ec2.internal:8020/user/hadoop/.sparkStaging/application_1684760745521_0001/hive-site.xml 23/05/22 13:07:58 INFO Client: Uploading resource file:/etc/hudi/conf.dist/hudi-defaults.conf -> hdfs://ip-10-73-103-194.ec2.internal:8020/user/hadoop/.sparkStaging/application_1684760745521_0001/hudi-defaults.conf 23/05/22 13:07:58 INFO Client: Uploading resource file:/mnt/tmp/spark-a2bd79dd-2460-4f21-aa8a-9e30bd7e24dc/__spark_conf__4894426805724937859.zip -> hdfs://ip-10-73-103-194.ec2.internal:8020/user/hadoop/.sparkStaging/application_1684760745521_0001/__spark_conf__.zip 23/05/22 13:07:58 INFO SecurityManager: Changing view acls to: hadoop 23/05/22 13:07:58 INFO SecurityManager: Changing modify acls to: hadoop 23/05/22 13:07:58 INFO SecurityManager: Changing view acls groups to: 23/05/22 13:07:58 INFO SecurityManager: Changing modify acls groups to: 23/05/22 13:07:58 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); groups with view permissions: Set(); users with modify permissions: Set(hadoop); groups with modify permissions: Set() 23/05/22 13:07:58 INFO Client: Submitting application application_1684760745521_0001 to ResourceManager 23/05/22 13:07:59 INFO YarnClientImpl: Submitted application application_1684760745521_0001 23/05/22 13:08:00 INFO Client: Application report for application_1684760745521_0001 (state: ACCEPTED) 23/05/22 13:08:00 INFO Client: client token: N/A diagnostics: AM container is launched, waiting for AM container to Register with RM ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1684760878895 final status: UNDEFINED tracking URL: http://ip-10-73-103-194.ec2.internal:20888/proxy/application_1684760745521_0001/ user: hadoop 23/05/22 13:08:01 INFO Client: Application report for application_1684760745521_0001 (state: ACCEPTED) 23/05/22 13:08:02 INFO Client: Application report for application_1684760745521_0001 (state: ACCEPTED) 23/05/22 13:08:03 INFO Client: Application report for application_1684760745521_0001 (state: ACCEPTED) 23/05/22 13:08:04 INFO Client: Application report for application_1684760745521_0001 (state: ACCEPTED) 23/05/22 13:08:05 INFO YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> ip-10-73-103-194.ec2.internal, PROXY_URI_BASES -> http://ip-10-73-103-194.ec2.internal:20888/proxy/application_1684760745521_0001), /proxy/application_1684760745521_0001 23/05/22 13:08:05 INFO Client: Application report for application_1684760745521_0001 (state: RUNNING) 23/05/22 13:08:05 INFO Client: client token: N/A diagnostics: N/A ApplicationMaster host: 10.73.102.168 ApplicationMaster RPC port: -1 queue: default start time: 1684760878895 final status: UNDEFINED tracking URL: http://ip-10-73-103-194.ec2.internal:20888/proxy/application_1684760745521_0001/ user: hadoop 23/05/22 13:08:05 INFO YarnClientSchedulerBackend: Application application_1684760745521_0001 has started running. 23/05/22 13:08:05 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 39825. 23/05/22 13:08:05 INFO NettyBlockTransferService: Server created on ip-10-73-103-194.ec2.internal:39825 23/05/22 13:08:05 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 23/05/22 13:08:05 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, ip-10-73-103-194.ec2.internal, 39825, None) 23/05/22 13:08:05 INFO BlockManagerMasterEndpoint: Registering block manager ip-10-73-103-194.ec2.internal:39825 with 3.0 GiB RAM, BlockManagerId(driver, ip-10-73-103-194.ec2.internal, 39825, None) 23/05/22 13:08:05 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, ip-10-73-103-194.ec2.internal, 39825, None) 23/05/22 13:08:05 INFO BlockManager: external shuffle service port = 7337 23/05/22 13:08:05 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, ip-10-73-103-194.ec2.internal, 39825, None) 23/05/22 13:08:05 INFO ServerInfo: Adding filter to /metrics/json: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter 23/05/22 13:08:05 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@29592929{/metrics/json,null,AVAILABLE,@Spark} 23/05/22 13:08:05 INFO SingleEventLogFileWriter: Logging events to hdfs:/var/log/spark/apps/application_1684760745521_0001.inprogress 23/05/22 13:08:05 INFO Utils: Using 50 preallocated executors (minExecutors: 0). Set spark.dynamicAllocation.preallocateExecutors to `false` disable executor preallocation. 23/05/22 13:08:05 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered! 23/05/22 13:08:05 INFO YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 23/05/22 13:08:05 INFO SparkSessionWrapper: Spark Session created sucessfully for environment 23/05/22 13:08:36 WARN HoodieSparkSqlWriter$: hoodie table at s3://trepp-developmentservices-lake/presentationZone/clo/clogoldenSetHoldings already exists. Deleting existing data & overwriting with new data. 23/05/22 13:08:45 WARN HoodieBackedTableMetadata: Metadata table was not found at path s3://trepp-developmentservices-lake/presentationZone/clo/clogoldenSetHoldings/.hoodie/metadata java.lang.Exception: Could not sync using the meta sync class org.apache.hudi.hive.HiveSyncTool at com.trepp.zone.ZoneExecutionHelper.upsert(ZoneExecutionHelper.scala:122) at com.trepp.zone.Presentation.$anonfun$writeHudiObject$1(Presentation.scala:92) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at scala.util.Try$.apply(Try.scala:213) at com.trepp.zone.Presentation.writeHudiObject(Presentation.scala:81) at com.trepp.process.Executor.$anonfun$writeObject$2(Executor.scala:136) at com.trepp.process.Executor.$anonfun$writeObject$2$adapted(Executor.scala:133) at scala.collection.immutable.List.foreach(List.scala:431) at com.trepp.process.Executor.$anonfun$writeObject$1(Executor.scala:133) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at scala.util.Try$.apply(Try.scala:213) at com.trepp.process.Executor.writeObject(Executor.scala:133) at com.trepp.process.Executor$$anon$2.$anonfun$accept$2(Executor.scala:118) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at scala.util.Try$.apply(Try.scala:213) at com.trepp.process.Executor$$anon$2.accept(Executor.scala:113) at com.trepp.process.Executor$$anon$2.accept(Executor.scala:111) at java.util.TreeMap.forEach(TreeMap.java:1005) at com.trepp.process.Executor.executeQuery(Executor.scala:111) at com.trepp.dataload.EtlImpl.$anonfun$executeProcess$3(EtlImpl.scala:43) at scala.util.Try$.apply(Try.scala:213) at com.trepp.dataload.EtlImpl.$anonfun$executeProcess$1(EtlImpl.scala:37) at com.trepp.dataload.EtlImpl.$anonfun$executeProcess$1$adapted(EtlImpl.scala:23) at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198) at com.trepp.dataload.EtlImpl.executeProcess(EtlImpl.scala:23) at com.trepp.TreppClient$.$anonfun$main$1(TreppClient.scala:46) at scala.util.Try$.apply(Try.scala:213) at com.trepp.TreppClient$.main(TreppClient.scala:40) at com.trepp.TreppClient.main(TreppClient.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1000) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1089) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1098) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 23/05/22 13:09:35 ERROR Presentation: Failed in writing data to locations3://trepp-developmentservices-lake/presentationZone/clo/() ` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
