Vincent Ohprecio created SPARK-14031:
----------------------------------------

             Summary: Dataframe to csv IO, system performance enters high CPU 
state and write operation takes 1 hour to complete
                 Key: SPARK-14031
                 URL: https://issues.apache.org/jira/browse/SPARK-14031
             Project: Spark
          Issue Type: Bug
          Components: Spark Shell
    Affects Versions: 2.0.0
         Environment: MACOSX 10.11.2 Macbook Pro 16g - 2.2 GHz Intel Core i7 
-1TB
* Screenshot http://imgur.com/a0zYgvj
Ubuntu14.04 Vagrant 4 Cores 8g 
* Screenshot http://imgur.com/WCmQkKj
            Reporter: Vincent Ohprecio
            Priority: Critical



Summary
When in spark-shell trying to write out results of dataframe to csv, system 
performance enters high CPU state and write operation takes 1 hour to complete. 
Recreate High CPU averaging 3488272270000ns or 1hour write of csv file.

1. Data File is "2008.csv"
2. Data file download http://stat-computing.org/dataexpo/2009/the-data.html
3. Code https://gist.github.com/bigsnarfdude/581b780ce85d7aaecbcb

High CPU and 58 minute average completion time MACOSX 10.11.2
Macbook Pro 16g - 2.2 GHz Intel Core i7 -1TB
* Screenshot http://imgur.com/a0zYgvj
1.  spark-assembly-2.0.0-SNAPSHOT-hadoop2.4.0.jar spark-csv_2.11-1.3 
https://gist.github.com/bigsnarfdude/403e18600d42fc24cf58
2.  spark-assembly-2.0.0-SNAPSHOT-hadoop2.4.0.jar spark-csv_2.11-1.2
https://gist.github.com/bigsnarfdude/5935fcbb80233cb83cc6
3.  spark-assembly-2.0.0-SNAPSHOT-hadoop2.4.0.jar spark-csv_2.11-1.4
https://gist.github.com/bigsnarfdude/581b780ce85d7aaecbcb


High CPU and waited over hour for csv write but didnt wait to complete 
Ubuntu14.04
* Screenshot http://imgur.com/WCmQkKj
1.  spark-assembly-2.0.0-SNAPSHOT-hadoop2.4.0.jar spark-csv_2.11-1.4
https://gist.github.com/bigsnarfdude/930f5832c231c3d39651
2.  spark-assembly-2.0.0-SNAPSHOT-hadoop2.4.0.jar spark-csv_2.11-1.3  
https://gist.github.com/bigsnarfdude/6d3a0b6733cc57dd22ac  


Tested Working 5-6 seconds MACOSX 10.11.2
1.  Spark 1.5.2 Scala 2.10 Spark-csv 1.4.0 Java 1.8 (Marco Mistroni)
2.  spark-assembly-1.4.0-hadoop2.4.0.jar spark-csv_2.10-1.4.0 java1.7
https://gist.github.com/bigsnarfdude/c540129813f3a0d7af2f
3.  spark-assembly-1.6.2-SNAPSHOT-hadoop2.4.0.jar spark-csv_2.10-1.4.0 java1.7 
https://gist.github.com/bigsnarfdude/0851fcecede9403b78fe
4.  Spark version 1.5.2 spark-csv_2.11:1.3.0 (Mich Talebzadeh)



Tested Working 20-22 seconds Ubuntu 14.04
1.  spark-assembly-1.6.2-SNAPSHOT-hadoop2.4.0.jar spark-csv_2.10-1.4.0
https://gist.github.com/bigsnarfdude/08b08f68aef4a4309bc0




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to