[ 
https://issues.apache.org/jira/browse/HUDI-6787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17835052#comment-17835052
 ] 

Jonathan Vexler commented on HUDI-6787:
---------------------------------------

{code:java}
root@adhoc-2:/opt# spark-submit \
>   --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer 
> $HUDI_UTILITIES_BUNDLE \
>   --table-type COPY_ON_WRITE \
>   --source-class org.apache.hudi.utilities.sources.JsonKafkaSource \
>   --source-ordering-field ts  \
>   --target-base-path /user/hive/warehouse/stock_ticks_cow \
>   --target-table stock_ticks_cow --props 
> /var/demo/config/kafka-source.properties \
>   --schemaprovider-class 
> org.apache.hudi.utilities.schema.FilebasedSchemaProvider
2024-04-08 21:13:35,067 WARN streamer.SchedulerConfGenerator: Job Scheduling 
Configs will not be in effect as spark.scheduler.mode is not set to FAIR at 
instantiation time. Continuing without scheduling configs
2024-04-08 21:13:35,211 INFO spark.SparkContext: Running Spark version 3.2.1
2024-04-08 21:13:35,247 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable
2024-04-08 21:13:35,346 INFO resource.ResourceUtils: 
==============================================================
2024-04-08 21:13:35,347 INFO resource.ResourceUtils: No custom resources 
configured for spark.driver.
2024-04-08 21:13:35,347 INFO resource.ResourceUtils: 
==============================================================
2024-04-08 21:13:35,348 INFO spark.SparkContext: Submitted application: 
streamer-stock_ticks_cow
2024-04-08 21:13:35,383 INFO resource.ResourceProfile: Default ResourceProfile 
created, executor resources: Map(cores -> name: cores, amount: 1, script: , 
vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> 
name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> 
name: cpus, amount: 1.0)
2024-04-08 21:13:35,396 INFO resource.ResourceProfile: Limiting resource is cpu
2024-04-08 21:13:35,396 INFO resource.ResourceProfileManager: Added 
ResourceProfile id: 0
2024-04-08 21:13:35,461 INFO spark.SecurityManager: Changing view acls to: root
2024-04-08 21:13:35,461 INFO spark.SecurityManager: Changing modify acls to: 
root
2024-04-08 21:13:35,462 INFO spark.SecurityManager: Changing view acls groups 
to: 
2024-04-08 21:13:35,462 INFO spark.SecurityManager: Changing modify acls groups 
to: 
2024-04-08 21:13:35,463 INFO spark.SecurityManager: SecurityManager: 
authentication disabled; ui acls disabled; users  with view permissions: 
Set(root); groups with view permissions: Set(); users  with modify permissions: 
Set(root); groups with modify permissions: Set()
2024-04-08 21:13:35,512 INFO Configuration.deprecation: 
mapred.output.compression.codec is deprecated. Instead, use 
mapreduce.output.fileoutputformat.compress.codec
2024-04-08 21:13:35,513 INFO Configuration.deprecation: mapred.output.compress 
is deprecated. Instead, use mapreduce.output.fileoutputformat.compress
2024-04-08 21:13:35,513 INFO Configuration.deprecation: 
mapred.output.compression.type is deprecated. Instead, use 
mapreduce.output.fileoutputformat.compress.type
2024-04-08 21:13:35,750 INFO util.Utils: Successfully started service 
'sparkDriver' on port 42169.
2024-04-08 21:13:35,789 INFO spark.SparkEnv: Registering MapOutputTracker
2024-04-08 21:13:35,826 INFO spark.SparkEnv: Registering BlockManagerMaster
2024-04-08 21:13:35,848 INFO storage.BlockManagerMasterEndpoint: Using 
org.apache.spark.storage.DefaultTopologyMapper for getting topology information
2024-04-08 21:13:35,850 INFO storage.BlockManagerMasterEndpoint: 
BlockManagerMasterEndpoint up
2024-04-08 21:13:35,856 INFO spark.SparkEnv: Registering 
BlockManagerMasterHeartbeat
2024-04-08 21:13:35,879 INFO storage.DiskBlockManager: Created local directory 
at /tmp/blockmgr-2e2fda2c-c1b4-4198-b790-58c00db5af27
2024-04-08 21:13:35,900 INFO memory.MemoryStore: MemoryStore started with 
capacity 366.3 MiB
2024-04-08 21:13:35,915 INFO spark.SparkEnv: Registering OutputCommitCoordinator
2024-04-08 21:13:36,009 INFO util.log: Logging initialized @2972ms to 
org.sparkproject.jetty.util.log.Slf4jLog
2024-04-08 21:13:36,135 INFO server.Server: jetty-9.4.43.v20210629; built: 
2021-06-30T11:07:22.254Z; git: 526006ecfa3af7f1a27ef3a288e2bef7ea9dd7e8; jvm 
1.8.0_212-b04
2024-04-08 21:13:36,162 INFO server.Server: Started @3125ms
2024-04-08 21:13:36,198 INFO server.AbstractConnector: Started 
ServerConnector@3e681bc{HTTP/1.1, (http/1.1)}{0.0.0.0:8090}
2024-04-08 21:13:36,199 INFO util.Utils: Successfully started service 'SparkUI' 
on port 8090.
2024-04-08 21:13:36,241 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@55b62629{/jobs,null,AVAILABLE,@Spark}
2024-04-08 21:13:36,244 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@15f193b8{/jobs/json,null,AVAILABLE,@Spark}
2024-04-08 21:13:36,245 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@304a9d7b{/jobs/job,null,AVAILABLE,@Spark}
2024-04-08 21:13:36,246 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@606fc505{/jobs/job/json,null,AVAILABLE,@Spark}
2024-04-08 21:13:36,248 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@2d140a7{/stages,null,AVAILABLE,@Spark}
2024-04-08 21:13:36,248 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@2aa27288{/stages/json,null,AVAILABLE,@Spark}
2024-04-08 21:13:36,249 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@77e80a5e{/stages/stage,null,AVAILABLE,@Spark}
2024-04-08 21:13:36,254 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@49298ce7{/stages/stage/json,null,AVAILABLE,@Spark}
2024-04-08 21:13:36,256 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@8dfe921{/stages/pool,null,AVAILABLE,@Spark}
2024-04-08 21:13:36,257 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@55f45b92{/stages/pool/json,null,AVAILABLE,@Spark}
2024-04-08 21:13:36,258 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@67fe380b{/storage,null,AVAILABLE,@Spark}
2024-04-08 21:13:36,261 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@3dedb4a6{/storage/json,null,AVAILABLE,@Spark}
2024-04-08 21:13:36,262 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@415e0bcb{/storage/rdd,null,AVAILABLE,@Spark}
2024-04-08 21:13:36,264 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@49d98dc5{/storage/rdd/json,null,AVAILABLE,@Spark}
2024-04-08 21:13:36,265 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@1d81e101{/environment,null,AVAILABLE,@Spark}
2024-04-08 21:13:36,266 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@bf71cec{/environment/json,null,AVAILABLE,@Spark}
2024-04-08 21:13:36,267 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@30cdae70{/executors,null,AVAILABLE,@Spark}
2024-04-08 21:13:36,268 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@2577d6c8{/executors/json,null,AVAILABLE,@Spark}
2024-04-08 21:13:36,270 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@6c000e0c{/executors/threadDump,null,AVAILABLE,@Spark}
2024-04-08 21:13:36,270 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@44f9779c{/executors/threadDump/json,null,AVAILABLE,@Spark}
2024-04-08 21:13:36,284 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@5e8a459{/static,null,AVAILABLE,@Spark}
2024-04-08 21:13:36,287 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@3ae66c85{/,null,AVAILABLE,@Spark}
2024-04-08 21:13:36,288 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@4604b900{/api,null,AVAILABLE,@Spark}
2024-04-08 21:13:36,290 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@367795c7{/jobs/job/kill,null,AVAILABLE,@Spark}
2024-04-08 21:13:36,292 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@3956b302{/stages/stage/kill,null,AVAILABLE,@Spark}
2024-04-08 21:13:36,295 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started 
at http://adhoc-2:8090
2024-04-08 21:13:36,327 INFO spark.SparkContext: Added JAR 
file:/var/hoodie/ws/docker/hoodie/hadoop/hive_base/target/hoodie-utilities.jar 
at spark://adhoc-2:42169/jars/hoodie-utilities.jar with timestamp 1712610815114
2024-04-08 21:13:36,642 INFO executor.Executor: Starting executor ID driver on 
host adhoc-2
2024-04-08 21:13:36,671 INFO executor.Executor: Fetching 
spark://adhoc-2:42169/jars/hoodie-utilities.jar with timestamp 1712610815114
2024-04-08 21:13:36,738 INFO client.TransportClientFactory: Successfully 
created connection to adhoc-2/172.19.0.13:42169 after 30 ms (0 ms spent in 
bootstraps)
2024-04-08 21:13:36,744 INFO util.Utils: Fetching 
spark://adhoc-2:42169/jars/hoodie-utilities.jar to 
/tmp/spark-5afa6c3f-d184-474b-b77d-172432bd1301/userFiles-380b19e8-ad33-4d55-9cba-21fc7bd327cf/fetchFileTemp2239860127836317155.tmp
2024-04-08 21:13:37,096 INFO executor.Executor: Adding 
file:/tmp/spark-5afa6c3f-d184-474b-b77d-172432bd1301/userFiles-380b19e8-ad33-4d55-9cba-21fc7bd327cf/hoodie-utilities.jar
 to class loader
2024-04-08 21:13:37,110 INFO util.Utils: Successfully started service 
'org.apache.spark.network.netty.NettyBlockTransferService' on port 35967.
2024-04-08 21:13:37,112 INFO netty.NettyBlockTransferService: Server created on 
adhoc-2:35967
2024-04-08 21:13:37,114 INFO storage.BlockManager: Using 
org.apache.spark.storage.RandomBlockReplicationPolicy for block replication 
policy
2024-04-08 21:13:37,121 INFO storage.BlockManagerMaster: Registering 
BlockManager BlockManagerId(driver, adhoc-2, 35967, None)
2024-04-08 21:13:37,124 INFO storage.BlockManagerMasterEndpoint: Registering 
block manager adhoc-2:35967 with 366.3 MiB RAM, BlockManagerId(driver, adhoc-2, 
35967, None)
2024-04-08 21:13:37,127 INFO storage.BlockManagerMaster: Registered 
BlockManager BlockManagerId(driver, adhoc-2, 35967, None)
2024-04-08 21:13:37,128 INFO storage.BlockManager: Initialized BlockManager: 
BlockManagerId(driver, adhoc-2, 35967, None)
2024-04-08 21:13:37,274 INFO handler.ContextHandler: Started 
o.s.j.s.ServletContextHandler@4cbf4f53{/metrics/json,null,AVAILABLE,@Spark}
2024-04-08 21:13:37,949 WARN config.DFSPropertiesConfiguration: Cannot find 
HUDI_CONF_DIR, please set it as the dir of hudi-defaults.conf
2024-04-08 21:13:37,964 WARN config.DFSPropertiesConfiguration: Properties file 
file:/etc/hudi/conf/hudi-defaults.conf not found. Ignoring to load props file
2024-04-08 21:13:38,453 INFO server.AbstractConnector: Stopped 
Spark@3e681bc{HTTP/1.1, (http/1.1)}{0.0.0.0:8090}
2024-04-08 21:13:38,460 INFO ui.SparkUI: Stopped Spark web UI at 
http://adhoc-2:8090
2024-04-08 21:13:38,495 INFO spark.MapOutputTrackerMasterEndpoint: 
MapOutputTrackerMasterEndpoint stopped!
2024-04-08 21:13:38,536 INFO memory.MemoryStore: MemoryStore cleared
2024-04-08 21:13:38,536 INFO storage.BlockManager: BlockManager stopped
2024-04-08 21:13:38,544 INFO storage.BlockManagerMaster: BlockManagerMaster 
stopped
2024-04-08 21:13:38,552 INFO 
scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
OutputCommitCoordinator stopped!
2024-04-08 21:13:38,585 INFO spark.SparkContext: Successfully stopped 
SparkContext
Exception in thread "main" java.lang.NoClassDefFoundError: scala/Function1$class
        at 
org.apache.spark.sql.hudi.HoodieSparkSessionExtension.<init>(HoodieSparkSessionExtension.scala:28)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at 
org.apache.spark.sql.SparkSession$.$anonfun$applyExtensions$1(SparkSession.scala:1195)
        at 
org.apache.spark.sql.SparkSession$.$anonfun$applyExtensions$1$adapted(SparkSession.scala:1192)
        at 
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
        at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
        at 
org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$applyExtensions(SparkSession.scala:1192)
        at 
org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:956)
        at org.apache.spark.sql.SQLContext$.getOrCreate(SQLContext.scala:1023)
        at org.apache.spark.sql.SQLContext.getOrCreate(SQLContext.scala)
        at 
org.apache.hudi.client.common.HoodieSparkEngineContext.<init>(HoodieSparkEngineContext.java:72)
        at 
org.apache.hudi.utilities.streamer.HoodieStreamer.<init>(HoodieStreamer.java:163)
        at 
org.apache.hudi.utilities.streamer.HoodieStreamer.<init>(HoodieStreamer.java:147)
        at 
org.apache.hudi.utilities.streamer.HoodieStreamer.<init>(HoodieStreamer.java:133)
        at 
org.apache.hudi.utilities.streamer.HoodieStreamer.main(HoodieStreamer.java:596)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: scala.Function1$class
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        ... 31 more {code}

> Hive Integrate FileGroupReader with HoodieMergeOnReadSnapshotReader and 
> RealtimeCompactedRecordReader for Hive
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: HUDI-6787
>                 URL: https://issues.apache.org/jira/browse/HUDI-6787
>             Project: Apache Hudi
>          Issue Type: New Feature
>            Reporter: Ethan Guo
>            Assignee: Jonathan Vexler
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to