[GitHub] [incubator-seatunnel] Bingz2 opened a new issue, #2084: [Bug] [Spark Connector]Reading hive using Spark JDBC Source fails to return the correct data

GitBox Wed, 29 Jun 2022 03:36:02 -0700


Bingz2 opened a new issue, #2084:
URL: https://github.com/apache/incubator-seatunnel/issues/2084


   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/incubator-seatunnel/issues?q=is%3Aissue+label%3A%22bug%22)
 and found no similar issues.
   
   
   ### What happened
   
   Reading a hive using Spark JDBC Source fails to return the correct data, 
which is actually column names rather than actual data
   
   ### SeaTunnel Version
   
   2.1.2
   
   ### SeaTunnel Config
   
   ```conf
   source {
     # This is a example input plugin **only for test and demonstrate the 
feature input plugin**
    jdbc {
            driver = "org.apache.hive.jdbc.HiveDriver"
            url = "jdbc:hive2://ip:10000/dws"
            table = "(select mac,user_id,province_code from 
dws.dws_gvp_user_mac_state_stat_dd where day='2022-06-01') tmp"
            result_table_name = "tmp"
            user = "hive"
            password = "hive"
            jdbc.fetchsize = 10000
   }
   }
   
   transform {
     # you can also use other filter plugins, such as sql
     # sql {
     #   sql = "select * from accesslog where request_time > 1000"
     # }
   }
   
   sink {
    console {
       limit = 10,
       serializer = "json"
    }
   }
   ```
   
   
   ### Running Command
   
   ```shell
   ./bin/start-seatunnel-spark.sh -m yarn -e client -c 
./config/spark.batch.hive.jdbc.conf
   ```
   
   
   ### Error Exception
   
   ```log
   22/06/28 17:39:12 INFO scheduler.DAGScheduler: Missing parents: List()
   22/06/28 17:39:12 INFO scheduler.DAGScheduler: Submitting ResultStage 0 
(MapPartitionsRDD[6] at take at Console.scala:47), which has no missing parents
   22/06/28 17:39:13 INFO memory.MemoryStore: Block broadcast_0 stored as 
values in memory (estimated size 18.3 KB, free 366.3 MB)
   22/06/28 17:39:13 INFO memory.MemoryStore: Block broadcast_0_piece0 stored 
as bytes in memory (estimated size 8.4 KB, free 366.3 MB)
   22/06/28 17:39:13 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in 
memory on slave1.test.gitv.we:16907 (size: 8.4 KB, free: 366.3 MB)
   22/06/28 17:39:13 INFO spark.SparkContext: Created broadcast 0 from 
broadcast at DAGScheduler.scala:1161
   22/06/28 17:39:13 INFO scheduler.DAGScheduler: Submitting 1 missing tasks 
from ResultStage 0 (MapPartitionsRDD[6] at take at Console.scala:47) (first 15 
tasks are for partitions Vector(0))
   22/06/28 17:39:13 INFO cluster.YarnScheduler: Adding task set 0.0 with 1 
tasks
   22/06/28 17:39:13 INFO yarn.SparkRackResolver: Got an error when resolving 
hostNames. Falling back to /default-rack for all
   22/06/28 17:39:13 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 
0.0 (TID 0, slave4.test.gitv.we, executor 1, partition 0, PROCESS_LOCAL, 7701 
bytes)
   22/06/28 17:39:14 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in 
memory on slave4.test.gitv.we:39237 (size: 8.4 KB, free: 912.3 MB)
   22/06/28 17:39:17 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 
0.0 (TID 0) in 4138 ms on slave4.test.gitv.we (executor 1) (1/1)
   22/06/28 17:39:17 INFO cluster.YarnScheduler: Removed TaskSet 0.0, whose 
tasks have all completed, from pool 
   22/06/28 17:39:17 INFO scheduler.DAGScheduler: ResultStage 0 (take at 
Console.scala:47) finished in 4.616 s
   22/06/28 17:39:17 INFO scheduler.DAGScheduler: Job 0 finished: take at 
Console.scala:47, took 4.674037 s
   {"mac":"mac","user_id":"user_id","province_code":"province_code"}
   {"mac":"mac","user_id":"user_id","province_code":"province_code"}
   {"mac":"mac","user_id":"user_id","province_code":"province_code"}
   {"mac":"mac","user_id":"user_id","province_code":"province_code"}
   {"mac":"mac","user_id":"user_id","province_code":"province_code"}
   {"mac":"mac","user_id":"user_id","province_code":"province_code"}
   {"mac":"mac","user_id":"user_id","province_code":"province_code"}
   {"mac":"mac","user_id":"user_id","province_code":"province_code"}
   {"mac":"mac","user_id":"user_id","province_code":"province_code"}
   {"mac":"mac","user_id":"user_id","province_code":"province_code"}
   22/06/28 17:39:17 INFO spark.SparkContext: Invoking stop() from shutdown hook
   22/06/28 17:39:17 INFO server.AbstractConnector: Stopped 
Spark@4e628b52{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
   22/06/28 17:39:17 INFO ui.SparkUI: Stopped Spark web UI at 
http://slave1.test.gitv.we:4040
   22/06/28 17:39:17 INFO cluster.YarnClientSchedulerBackend: Interrupting 
monitor thread
   22/06/28 17:39:17 INFO cluster.YarnClientSchedulerBackend: Shutting down all 
executors
   22/06/28 17:39:17 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: 
Asking each executor to shut down
   22/06/28 17:39:17 INFO cluster.SchedulerExtensionServices: Stopping 
SchedulerExtensionServices
   (serviceOption=None,
    services=List(),
    started=false)
   22/06/28 17:39:17 INFO cluster.YarnClientSchedulerBackend: Stopped
   22/06/28 17:39:17 INFO spark.MapOutputTrackerMasterEndpoint: 
MapOutputTrackerMasterEndpoint stopped!
   22/06/28 17:39:17 INFO memory.MemoryStore: MemoryStore cleared
   22/06/28 17:39:17 INFO storage.BlockManager: BlockManager stopped
   22/06/28 17:39:17 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
   22/06/28 17:39:17 INFO 
scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
OutputCommitCoordinator stopped!
   22/06/28 17:39:17 INFO spark.SparkContext: Successfully stopped SparkContext
   22/06/28 17:39:17 INFO util.ShutdownHookManager: Shutdown hook called
   ```
   
   
   ### Flink or Spark Version
   
   Spark version 2.4.0.cloudera2
   
   ### Java or Scala Version
   
   Scala version 2.11.12
   java version 1.8.0_112
   
   ### Screenshots
   
   
![无标题](https://user-images.githubusercontent.com/32196893/176331805-e91aed90-4fef-4044-9b34-44a008300194.png)
   
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [incubator-seatunnel] Bingz2 opened a new issue, #2084: [Bug] [Spark Connector]Reading hive using Spark JDBC Source fails to return the correct data

Reply via email to