Andrew Ya created KUDU-2210:
-------------------------------

             Summary: Apache Spark stucks while reading Kudu table.
                 Key: KUDU-2210
                 URL: https://issues.apache.org/jira/browse/KUDU-2210
             Project: Kudu
          Issue Type: Bug
          Components: spark
            Reporter: Andrew Ya


When I try reading Kudu table with Apache Spark using following code
{code}
import org.apache.kudu.spark.kudu._
import sqlContext.implicits._
val kuduOptions: Map[String, String] = Map(
"kudu.table"  -> "test_table", 
"kudu.master" -> "host1:7051,host2:7051,host3:7051")
val kuduDF = sqlContext.read.options(kuduOptions).kudu
kuduDF.registerTempTable("t")
sqlContext.sql(" SELECT * FROM t  where id in (1111,2222) ").show(50, false)
{code}

after completing 95% of tasks the job stucks for more than three days.  The 
table is partitioned by date and partitions have uneven size. Table have one 
partition 12 Gb size, about 20 partitions with size between 1 Gb and 3 Gb and 
some partitions with Mb's and kb's of data.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to