[ 
https://issues.apache.org/jira/browse/KUDU-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16244876#comment-16244876
 ] 

Todd Lipcon commented on KUDU-2210:
-----------------------------------

What version of the kudu-spark package are you using? We had some client hang 
bugs in earlier versions. I'd suggest using the latest (even if you can't 
update your cluster, it should be compatible). If you can get a jstack of the 
hung spark task that would also help.

> Apache Spark stucks while reading Kudu table.
> ---------------------------------------------
>
>                 Key: KUDU-2210
>                 URL: https://issues.apache.org/jira/browse/KUDU-2210
>             Project: Kudu
>          Issue Type: Bug
>          Components: client, perf, spark
>            Reporter: Andrew Ya
>
> When I try reading Kudu table with Apache Spark using following code
> {code}
> import org.apache.kudu.spark.kudu._
> import sqlContext.implicits._
> val kuduOptions: Map[String, String] = Map(
> "kudu.table"  -> "test_table", 
> "kudu.master" -> "host1:7051,host2:7051,host3:7051")
> val kuduDF = sqlContext.read.options(kuduOptions).kudu
> kuduDF.registerTempTable("t")
> sqlContext.sql(" SELECT * FROM t  where id in (1111,2222) ").show(50, false)
> {code}
> after completing 95% of tasks the job stucks for more than three days.  The 
> table is partitioned by date and partitions have uneven size. Table have one 
> partition 12 Gb size, about 20 partitions with size between 1 Gb and 3 Gb and 
> some partitions with Mb's and kb's of data.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to