[I] [SUPPORT] Seeking Assistance with Hudi Integration Issue in Spark Thrift Server [hudi]

via GitHub Fri, 08 Dec 2023 10:15:32 -0800


soumilshah1995 opened a new issue, #10287:
URL: https://github.com/apache/hudi/issues/10287


   Hey community,
   I hope you're doing well. I recently launched a Thrift server using Spark, 
incorporating the Hudi library. The server runs smoothly, and I can interact 
with it using Beeline to query data successfully.
   
   ```
   spark-submit \
     --master 'local[*]' \
     --conf spark.executor.extraJavaOptions=-Duser.timezone=Etc/UTC \
     --conf spark.eventLog.enabled=false \
     --conf 
spark.sql.warehouse.dir=file:///Users/soumilshah/Desktop/soumil/sparkwarehouse \
     --packages 
'org.apache.hudi:hudi-spark3-bundle_2.12:0.14.0,org.apache.spark:spark-sql_2.12:3.4.0,org.apache.spark:spark-hive_2.12:3.4.0'
 \
     --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 \
     --name "Thrift JDBC/ODBC Server" \
     --executor-memory 512m \
     --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
     --conf 
spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension \
     --conf spark.kryo.registrator=org.apache.spark.HoodieSparkKryoRegistrar
   
   ```
   
   BEELINE
   ```
   beeline -u jdbc:hive2://localhost:10000/default
   ```
   
   on BEELINE
   
   ```
   CREATE TABLE hudi_table (
       ts BIGINT,
       uuid STRING,
       rider STRING,
       driver STRING,
       fare DOUBLE,
       city STRING
   ) USING HUDI
   PARTITIONED BY (city);
   ```
   
   Works fine
   
   INSerted data
   ```
   INSERT INTO hudi_table
   VALUES
   
(1695159649087,'334e26e9-8355-45cc-97c6-c31daf0df330','rider-A','driver-K',19.10,'san_francisco'),
   
(1695091554788,'e96c4396-3fad-413a-a942-4cb36106d721','rider-C','driver-M',27.70
 ,'san_francisco'),
   
(1695046462179,'9909a8b1-2d15-4d3d-8ec9-efc48c536a00','rider-D','driver-L',33.90
 ,'san_francisco'),
   
(1695332066204,'1dced545-862b-4ceb-8b43-d2a568f6616b','rider-E','driver-O',93.50,'san_francisco'),
   
(1695516137016,'e3cf430c-889d-4015-bc98-59bdce1e530c','rider-F','driver-P',34.15,'sao_paulo'
    ),
   
(1695376420876,'7a84095f-737f-40bc-b62f-6b69664712d2','rider-G','driver-Q',43.40
 ,'sao_paulo'    ),
   
(1695173887231,'3eeb61f7-c2b0-4636-99bd-5d7a5a1d2c04','rider-I','driver-S',41.06
 ,'chennai'      ),
   
(1695115999911,'c8abbe79-8d89-47ea-b4ce-4d224bae5bfa','rider-J','driver-T',17.85,'chennai');
   ```
   
   
   Now when i am trying to connect with DBT or DBeaver
   to run SQL query against i see following error
   
   ```
   QL Error: org.apache.hive.service.cli.HiveSQLException: Error running query: 
java.util.concurrent.ExecutionException: 
org.apache.spark.SparkClassNotFoundException: [DATA_SOURCE_NOT_FOUND] Failed to 
find the data source: hudi. Please find packages at 
`https://spark.apache.org/third-party-projects.html`.
   ```
   
   
   I have successfully created a table and inserted data into Hudi tables using 
Beeline. The problem arises when I try to interact with Hudi tables using tools 
like DBT or DB Ever.
   
   Any insights or guidance on resolving this issue would be greatly 
appreciated! If you have any experience with integrating Hudi into Spark Thrift 
Server and overcoming similar challenges, your expertise would be invaluable.
   
   Thanks in advance for your help!
   
   Regards
   Soumil
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [SUPPORT] Seeking Assistance with Hudi Integration Issue in Spark Thrift Server [hudi]

Reply via email to