[GitHub] [hudi] devanshguptatrepp commented on issue #8777: [SUPPORT] Meta sync error when trying to write to s3 bucket

via GitHub Mon, 22 May 2023 06:26:35 -0700


devanshguptatrepp commented on issue #8777:
URL: https://github.com/apache/hudi/issues/8777#issuecomment-1557221338


   `SLF4J: Class path contains multiple SLF4J bindings.
   SLF4J: Found binding in 
[jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
   SLF4J: Found binding in 
[jar:file:/usr/lib/tez/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
   SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
explanation.
   SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
   23/05/22 13:07:37 INFO TreppClient$: creating spark context and parameter 
loadingcom.trepp.TreppClient$
   23/05/22 13:07:37 INFO ApplicationFactory: Application has been found with 
name dataloadfor classcom.trepp.dataload.EtlImpl@36bed37a
   23/05/22 13:07:38 INFO HiveConf: Found configuration file 
file:/etc/spark/conf.dist/hive-site.xml
   23/05/22 13:07:38 INFO SparkContext: Running Spark version 3.2.1-amzn-0
   23/05/22 13:07:38 INFO ResourceUtils: 
==============================================================
   23/05/22 13:07:38 INFO ResourceUtils: No custom resources configured for 
spark.driver.
   23/05/22 13:07:38 INFO ResourceUtils: 
==============================================================
   23/05/22 13:07:38 INFO SparkContext: Submitted application: 
com.trepp.TreppClient
   23/05/22 13:07:38 INFO ResourceProfile: Default ResourceProfile created, 
executor resources: Map(cores -> name: cores, amount: 4, script: , vendor: , 
memory -> name: memory, amount: 6144, script: , vendor: , offHeap -> name: 
offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: 
cpus, amount: 1.0)
   23/05/22 13:07:38 INFO ResourceProfile: Limiting resource is cpus at 4 tasks 
per executor
   23/05/22 13:07:38 INFO ResourceProfileManager: Added ResourceProfile id: 0
   23/05/22 13:07:38 INFO SecurityManager: Changing view acls to: hadoop
   23/05/22 13:07:38 INFO SecurityManager: Changing modify acls to: hadoop
   23/05/22 13:07:38 INFO SecurityManager: Changing view acls groups to: 
   23/05/22 13:07:38 INFO SecurityManager: Changing modify acls groups to: 
   23/05/22 13:07:38 INFO SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users  with view permissions: Set(hadoop); groups 
with view permissions: Set(); users  with modify permissions: Set(hadoop); 
groups with modify permissions: Set()
   23/05/22 13:07:39 INFO Utils: Successfully started service 'sparkDriver' on 
port 40847.
   23/05/22 13:07:39 INFO SparkEnv: Registering MapOutputTracker
   23/05/22 13:07:39 INFO SparkEnv: Registering BlockManagerMaster
   23/05/22 13:07:39 INFO BlockManagerMasterEndpoint: Using 
org.apache.spark.storage.DefaultTopologyMapper for getting topology information
   23/05/22 13:07:39 INFO BlockManagerMasterEndpoint: 
BlockManagerMasterEndpoint up
   23/05/22 13:07:39 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
   23/05/22 13:07:39 INFO DiskBlockManager: Created local directory at 
/mnt/tmp/blockmgr-0dbb342c-968b-492d-8427-5fecf7411ac7
   23/05/22 13:07:39 INFO MemoryStore: MemoryStore started with capacity 3.0 GiB
   23/05/22 13:07:39 INFO SparkEnv: Registering OutputCommitCoordinator
   23/05/22 13:07:39 INFO SubResultCacheManager: Sub-result caches are disabled.
   23/05/22 13:07:39 INFO log: Logging initialized @19094ms to 
org.sparkproject.jetty.util.log.Slf4jLog
   23/05/22 13:07:39 INFO Server: jetty-9.4.43.v20210629; built: 
2021-06-30T11:07:22.254Z; git: 526006ecfa3af7f1a27ef3a288e2bef7ea9dd7e8; jvm 
1.8.0_372-b07
   23/05/22 13:07:39 INFO Server: Started @19207ms
   23/05/22 13:07:39 INFO AbstractConnector: Started 
ServerConnector@53f7a906{HTTP/1.1, (http/1.1)}{0.0.0.0:4040}
   23/05/22 13:07:39 INFO Utils: Successfully started service 'SparkUI' on port 
4040.
   23/05/22 13:07:39 INFO ContextHandler: Started 
o.s.j.s.ServletContextHandler@184751f3{/jobs,null,AVAILABLE,@Spark}
   23/05/22 13:07:39 INFO ContextHandler: Started 
o.s.j.s.ServletContextHandler@2cd3fc29{/jobs/json,null,AVAILABLE,@Spark}
   23/05/22 13:07:39 INFO ContextHandler: Started 
o.s.j.s.ServletContextHandler@3513d214{/jobs/job,null,AVAILABLE,@Spark}
   23/05/22 13:07:39 INFO ContextHandler: Started 
o.s.j.s.ServletContextHandler@46b5f061{/jobs/job/json,null,AVAILABLE,@Spark}
   23/05/22 13:07:39 INFO ContextHandler: Started 
o.s.j.s.ServletContextHandler@108b121f{/stages,null,AVAILABLE,@Spark}
   23/05/22 13:07:39 INFO ContextHandler: Started 
o.s.j.s.ServletContextHandler@2ff498b0{/stages/json,null,AVAILABLE,@Spark}
   23/05/22 13:07:39 INFO ContextHandler: Started 
o.s.j.s.ServletContextHandler@4300e240{/stages/stage,null,AVAILABLE,@Spark}
   23/05/22 13:07:39 INFO ContextHandler: Started 
o.s.j.s.ServletContextHandler@37a67cf{/stages/stage/json,null,AVAILABLE,@Spark}
   23/05/22 13:07:39 INFO ContextHandler: Started 
o.s.j.s.ServletContextHandler@5908e6d6{/stages/pool,null,AVAILABLE,@Spark}
   23/05/22 13:07:39 INFO ContextHandler: Started 
o.s.j.s.ServletContextHandler@2a6fb62f{/stages/pool/json,null,AVAILABLE,@Spark}
   23/05/22 13:07:39 INFO ContextHandler: Started 
o.s.j.s.ServletContextHandler@7b44bfb8{/storage,null,AVAILABLE,@Spark}
   23/05/22 13:07:39 INFO ContextHandler: Started 
o.s.j.s.ServletContextHandler@98637a2{/storage/json,null,AVAILABLE,@Spark}
   23/05/22 13:07:39 INFO ContextHandler: Started 
o.s.j.s.ServletContextHandler@141aba65{/storage/rdd,null,AVAILABLE,@Spark}
   23/05/22 13:07:39 INFO ContextHandler: Started 
o.s.j.s.ServletContextHandler@b55f5b7{/storage/rdd/json,null,AVAILABLE,@Spark}
   23/05/22 13:07:39 INFO ContextHandler: Started 
o.s.j.s.ServletContextHandler@6b2ef50e{/environment,null,AVAILABLE,@Spark}
   23/05/22 13:07:39 INFO ContextHandler: Started 
o.s.j.s.ServletContextHandler@4b5ad306{/environment/json,null,AVAILABLE,@Spark}
   23/05/22 13:07:39 INFO ContextHandler: Started 
o.s.j.s.ServletContextHandler@48a46b0f{/executors,null,AVAILABLE,@Spark}
   23/05/22 13:07:39 INFO ContextHandler: Started 
o.s.j.s.ServletContextHandler@9f9146d{/executors/json,null,AVAILABLE,@Spark}
   23/05/22 13:07:39 INFO ContextHandler: Started 
o.s.j.s.ServletContextHandler@45e7bb79{/executors/threadDump,null,AVAILABLE,@Spark}
   23/05/22 13:07:39 INFO ContextHandler: Started 
o.s.j.s.ServletContextHandler@21c75084{/executors/threadDump/json,null,AVAILABLE,@Spark}
   23/05/22 13:07:39 INFO ContextHandler: Started 
o.s.j.s.ServletContextHandler@75527e36{/static,null,AVAILABLE,@Spark}
   23/05/22 13:07:39 INFO ContextHandler: Started 
o.s.j.s.ServletContextHandler@be6d228{/,null,AVAILABLE,@Spark}
   23/05/22 13:07:39 INFO ContextHandler: Started 
o.s.j.s.ServletContextHandler@7eee6c13{/api,null,AVAILABLE,@Spark}
   23/05/22 13:07:39 INFO ContextHandler: Started 
o.s.j.s.ServletContextHandler@2ae5bd34{/jobs/job/kill,null,AVAILABLE,@Spark}
   23/05/22 13:07:39 INFO ContextHandler: Started 
o.s.j.s.ServletContextHandler@3a16984c{/stages/stage/kill,null,AVAILABLE,@Spark}
   23/05/22 13:07:39 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at 
http://ip-10-73-103-194.ec2.internal:4040
   23/05/22 13:07:39 INFO SparkContext: Added JAR 
s3://treppsamplebucket/mdm/etl-2.0-SNAPSHOT-jar-with-dependencies.jar at 
s3://treppsamplebucket/mdm/etl-2.0-SNAPSHOT-jar-with-dependencies.jar with 
timestamp 1684760858447
   23/05/22 13:07:39 WARN FairSchedulableBuilder: Fair Scheduler configuration 
file not found so jobs will be scheduled in FIFO order. To use fair scheduling, 
configure pools in fairscheduler.xml or set spark.scheduler.allocation.file to 
a file that contains the configuration.
   23/05/22 13:07:39 INFO FairSchedulableBuilder: Created default pool: 
default, schedulingMode: FIFO, minShare: 0, weight: 1
   23/05/22 13:07:40 INFO Utils: Using 50 preallocated executors (minExecutors: 
0). Set spark.dynamicAllocation.preallocateExecutors to `false` disable 
executor preallocation.
   23/05/22 13:07:40 INFO RMProxy: Connecting to ResourceManager at 
ip-10-73-103-194.ec2.internal/10.73.103.194:8032
   23/05/22 13:07:40 INFO Client: Requesting a new application from cluster 
with 6 NodeManagers
   23/05/22 13:07:40 INFO Configuration: resource-types.xml not found
   23/05/22 13:07:40 INFO ResourceUtils: Unable to find 'resource-types.xml'.
   23/05/22 13:07:40 INFO Client: Verifying our application has not requested 
more than the maximum memory capability of the cluster (11712 MB per container)
   23/05/22 13:07:40 INFO Client: Will allocate AM container, with 896 MB 
memory including 384 MB overhead
   23/05/22 13:07:40 INFO Client: Setting up container launch context for our AM
   23/05/22 13:07:40 INFO Client: Setting up the launch environment for our AM 
container
   23/05/22 13:07:40 INFO Client: Preparing resources for our AM container
   23/05/22 13:07:40 WARN Client: Neither spark.yarn.jars nor 
spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
   23/05/22 13:07:56 INFO Client: Uploading resource 
file:/mnt/tmp/spark-a2bd79dd-2460-4f21-aa8a-9e30bd7e24dc/__spark_libs__7560258025190807014.zip
 -> 
hdfs://ip-10-73-103-194.ec2.internal:8020/user/hadoop/.sparkStaging/application_1684760745521_0001/__spark_libs__7560258025190807014.zip
   23/05/22 13:07:57 INFO Client: Uploading resource 
s3a://trepp-developmentservices-lake-workspace/binaries/hudi/etl/hudi-spark-bundle_2.12-0.11.0.jar
 -> 
hdfs://ip-10-73-103-194.ec2.internal:8020/user/hadoop/.sparkStaging/application_1684760745521_0001/hudi-spark-bundle_2.12-0.11.0.jar
   23/05/22 13:07:58 INFO Client: Uploading resource 
file:/etc/spark/conf.dist/hive-site.xml -> 
hdfs://ip-10-73-103-194.ec2.internal:8020/user/hadoop/.sparkStaging/application_1684760745521_0001/hive-site.xml
   23/05/22 13:07:58 INFO Client: Uploading resource 
file:/etc/hudi/conf.dist/hudi-defaults.conf -> 
hdfs://ip-10-73-103-194.ec2.internal:8020/user/hadoop/.sparkStaging/application_1684760745521_0001/hudi-defaults.conf
   23/05/22 13:07:58 INFO Client: Uploading resource 
file:/mnt/tmp/spark-a2bd79dd-2460-4f21-aa8a-9e30bd7e24dc/__spark_conf__4894426805724937859.zip
 -> 
hdfs://ip-10-73-103-194.ec2.internal:8020/user/hadoop/.sparkStaging/application_1684760745521_0001/__spark_conf__.zip
   23/05/22 13:07:58 INFO SecurityManager: Changing view acls to: hadoop
   23/05/22 13:07:58 INFO SecurityManager: Changing modify acls to: hadoop
   23/05/22 13:07:58 INFO SecurityManager: Changing view acls groups to: 
   23/05/22 13:07:58 INFO SecurityManager: Changing modify acls groups to: 
   23/05/22 13:07:58 INFO SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users  with view permissions: Set(hadoop); groups 
with view permissions: Set(); users  with modify permissions: Set(hadoop); 
groups with modify permissions: Set()
   23/05/22 13:07:58 INFO Client: Submitting application 
application_1684760745521_0001 to ResourceManager
   23/05/22 13:07:59 INFO YarnClientImpl: Submitted application 
application_1684760745521_0001
   23/05/22 13:08:00 INFO Client: Application report for 
application_1684760745521_0001 (state: ACCEPTED)
   23/05/22 13:08:00 INFO Client: 
         client token: N/A
         diagnostics: AM container is launched, waiting for AM container to 
Register with RM
         ApplicationMaster host: N/A
         ApplicationMaster RPC port: -1
         queue: default
         start time: 1684760878895
         final status: UNDEFINED
         tracking URL: 
http://ip-10-73-103-194.ec2.internal:20888/proxy/application_1684760745521_0001/
         user: hadoop
   23/05/22 13:08:01 INFO Client: Application report for 
application_1684760745521_0001 (state: ACCEPTED)
   23/05/22 13:08:02 INFO Client: Application report for 
application_1684760745521_0001 (state: ACCEPTED)
   23/05/22 13:08:03 INFO Client: Application report for 
application_1684760745521_0001 (state: ACCEPTED)
   23/05/22 13:08:04 INFO Client: Application report for 
application_1684760745521_0001 (state: ACCEPTED)
   23/05/22 13:08:05 INFO YarnClientSchedulerBackend: Add WebUI Filter. 
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> 
ip-10-73-103-194.ec2.internal, PROXY_URI_BASES -> 
http://ip-10-73-103-194.ec2.internal:20888/proxy/application_1684760745521_0001),
 /proxy/application_1684760745521_0001
   23/05/22 13:08:05 INFO Client: Application report for 
application_1684760745521_0001 (state: RUNNING)
   23/05/22 13:08:05 INFO Client: 
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: 10.73.102.168
         ApplicationMaster RPC port: -1
         queue: default
         start time: 1684760878895
         final status: UNDEFINED
         tracking URL: 
http://ip-10-73-103-194.ec2.internal:20888/proxy/application_1684760745521_0001/
         user: hadoop
   23/05/22 13:08:05 INFO YarnClientSchedulerBackend: Application 
application_1684760745521_0001 has started running.
   23/05/22 13:08:05 INFO Utils: Successfully started service 
'org.apache.spark.network.netty.NettyBlockTransferService' on port 39825.
   23/05/22 13:08:05 INFO NettyBlockTransferService: Server created on 
ip-10-73-103-194.ec2.internal:39825
   23/05/22 13:08:05 INFO BlockManager: Using 
org.apache.spark.storage.RandomBlockReplicationPolicy for block replication 
policy
   23/05/22 13:08:05 INFO BlockManagerMaster: Registering BlockManager 
BlockManagerId(driver, ip-10-73-103-194.ec2.internal, 39825, None)
   23/05/22 13:08:05 INFO BlockManagerMasterEndpoint: Registering block manager 
ip-10-73-103-194.ec2.internal:39825 with 3.0 GiB RAM, BlockManagerId(driver, 
ip-10-73-103-194.ec2.internal, 39825, None)
   23/05/22 13:08:05 INFO BlockManagerMaster: Registered BlockManager 
BlockManagerId(driver, ip-10-73-103-194.ec2.internal, 39825, None)
   23/05/22 13:08:05 INFO BlockManager: external shuffle service port = 7337
   23/05/22 13:08:05 INFO BlockManager: Initialized BlockManager: 
BlockManagerId(driver, ip-10-73-103-194.ec2.internal, 39825, None)
   23/05/22 13:08:05 INFO ServerInfo: Adding filter to /metrics/json: 
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
   23/05/22 13:08:05 INFO ContextHandler: Started 
o.s.j.s.ServletContextHandler@29592929{/metrics/json,null,AVAILABLE,@Spark}
   23/05/22 13:08:05 INFO SingleEventLogFileWriter: Logging events to 
hdfs:/var/log/spark/apps/application_1684760745521_0001.inprogress
   23/05/22 13:08:05 INFO Utils: Using 50 preallocated executors (minExecutors: 
0). Set spark.dynamicAllocation.preallocateExecutors to `false` disable 
executor preallocation.
   23/05/22 13:08:05 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted 
to request executors before the AM has registered!
   23/05/22 13:08:05 INFO YarnClientSchedulerBackend: SchedulerBackend is ready 
for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
   23/05/22 13:08:05 INFO SparkSessionWrapper: Spark Session created 
sucessfully for environment
   23/05/22 13:08:36 WARN HoodieSparkSqlWriter$: hoodie table at 
s3://trepp-developmentservices-lake/presentationZone/clo/clogoldenSetHoldings 
already exists. Deleting existing data & overwriting with new data.
   23/05/22 13:08:45 WARN HoodieBackedTableMetadata: Metadata table was not 
found at path 
s3://trepp-developmentservices-lake/presentationZone/clo/clogoldenSetHoldings/.hoodie/metadata
   java.lang.Exception: Could not sync using the meta sync class 
org.apache.hudi.hive.HiveSyncTool
        at 
com.trepp.zone.ZoneExecutionHelper.upsert(ZoneExecutionHelper.scala:122)
        at 
com.trepp.zone.Presentation.$anonfun$writeHudiObject$1(Presentation.scala:92)
        at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at scala.util.Try$.apply(Try.scala:213)
        at com.trepp.zone.Presentation.writeHudiObject(Presentation.scala:81)
        at com.trepp.process.Executor.$anonfun$writeObject$2(Executor.scala:136)
        at 
com.trepp.process.Executor.$anonfun$writeObject$2$adapted(Executor.scala:133)
        at scala.collection.immutable.List.foreach(List.scala:431)
        at com.trepp.process.Executor.$anonfun$writeObject$1(Executor.scala:133)
        at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at scala.util.Try$.apply(Try.scala:213)
        at com.trepp.process.Executor.writeObject(Executor.scala:133)
        at 
com.trepp.process.Executor$$anon$2.$anonfun$accept$2(Executor.scala:118)
        at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at scala.util.Try$.apply(Try.scala:213)
        at com.trepp.process.Executor$$anon$2.accept(Executor.scala:113)
        at com.trepp.process.Executor$$anon$2.accept(Executor.scala:111)
        at java.util.TreeMap.forEach(TreeMap.java:1005)
        at com.trepp.process.Executor.executeQuery(Executor.scala:111)
        at 
com.trepp.dataload.EtlImpl.$anonfun$executeProcess$3(EtlImpl.scala:43)
        at scala.util.Try$.apply(Try.scala:213)
        at 
com.trepp.dataload.EtlImpl.$anonfun$executeProcess$1(EtlImpl.scala:37)
        at 
com.trepp.dataload.EtlImpl.$anonfun$executeProcess$1$adapted(EtlImpl.scala:23)
        at 
scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
        at 
scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
        at com.trepp.dataload.EtlImpl.executeProcess(EtlImpl.scala:23)
        at com.trepp.TreppClient$.$anonfun$main$1(TreppClient.scala:46)
        at scala.util.Try$.apply(Try.scala:213)
        at com.trepp.TreppClient$.main(TreppClient.scala:40)
        at com.trepp.TreppClient.main(TreppClient.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1000)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1089)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1098)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   23/05/22 13:09:35 ERROR Presentation: Failed in writing data to 
locations3://trepp-developmentservices-lake/presentationZone/clo/()
   `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] devanshguptatrepp commented on issue #8777: [SUPPORT] Meta sync error when trying to write to s3 bucket

Reply via email to