[
https://issues.apache.org/jira/browse/KUDU-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Ya updated KUDU-2210:
----------------------------
Component/s: perf
client
> Apache Spark stucks while reading Kudu table.
> ---------------------------------------------
>
> Key: KUDU-2210
> URL: https://issues.apache.org/jira/browse/KUDU-2210
> Project: Kudu
> Issue Type: Bug
> Components: client, perf, spark
> Reporter: Andrew Ya
>
> When I try reading Kudu table with Apache Spark using following code
> {code}
> import org.apache.kudu.spark.kudu._
> import sqlContext.implicits._
> val kuduOptions: Map[String, String] = Map(
> "kudu.table" -> "test_table",
> "kudu.master" -> "host1:7051,host2:7051,host3:7051")
> val kuduDF = sqlContext.read.options(kuduOptions).kudu
> kuduDF.registerTempTable("t")
> sqlContext.sql(" SELECT * FROM t where id in (1111,2222) ").show(50, false)
> {code}
> after completing 95% of tasks the job stucks for more than three days. The
> table is partitioned by date and partitions have uneven size. Table have one
> partition 12 Gb size, about 20 partitions with size between 1 Gb and 3 Gb and
> some partitions with Mb's and kb's of data.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)