Madhavi Vaddepalli created SPARK-21650:

             Summary: Insert into hive partitioned table from spark-sql taking 
hours to complete
                 Key: SPARK-21650
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.6.0
         Environment: Linux machines
Spark version - 1.6.0
Hive Version - 1.1
200- number of executors.
3 - number of executor cores.
10g - executor and driver memory.
dynamic allocation enabled.
            Reporter: Madhavi Vaddepalli

We are trying to execute some logic using spark sql:
Input to program : 7 billion records. (60 gb gzip compressed,text format)
Output : 7 billion records.(260 gb gzip compressed and partitioned on few 
              output has 10000 partitions(it has 10000 different combinations 
of partition columns)

We are trying to insert this output to a hive table. (text format , gzip 
All the tasks spawned finished completely in 33 minutes and all the executors 
are de-commissioned, only driver is active.*It remained in this state without 
showing any active stage or task in spark UI for about 2.5 hrs. *and completed 

Please let us know what can be done to improve the performance here.(is it 
fixed in later versions ?)

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

Reply via email to