[jira] [Updated] (SPARK-14031) Dataframe to csv IO, system performance enters high CPU state and write operation takes 1 hour to complete

Vincent Ohprecio (JIRA) Sun, 20 Mar 2016 09:40:23 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-14031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Vincent Ohprecio updated SPARK-14031:
-------------------------------------
    Description: 
Summary
When in spark-shell trying to write out results of dataframe to csv, system 
performance enters high CPU state and write operation takes 1 hour to complete. 
Affecting:
 [Stage 5:>                                                         (0 + 2) / 
21]

Recreate High CPU averaging 3488272270000ns or 1hour write of csv file.

1. Data File is "2008.csv"
2. Data file download http://stat-computing.org/dataexpo/2009/the-data.html
3. Code https://gist.github.com/bigsnarfdude/581b780ce85d7aaecbcb

High CPU and 58 minute average completion time MACOSX 10.11.2
Macbook Pro 16g - 2.2 GHz Intel Core i7 -1TB
1.  spark-assembly-2.0.0-SNAPSHOT-hadoop2.4.0.jar spark-csv_2.11-1.4
https://gist.github.com/bigsnarfdude/581b780ce85d7aaecbcb

High CPU and waited over hour for csv write but didnt wait to complete 
Ubuntu14.04
1.  spark-assembly-2.0.0-SNAPSHOT-hadoop2.4.0.jar spark-csv_2.11-1.4
https://gist.github.com/bigsnarfdude/930f5832c231c3d39651


  was:
Summary
When in spark-shell trying to write out results of dataframe to csv, system 
performance enters high CPU state and write operation takes 1 hour to complete. 
Recreate High CPU averaging 3488272270000ns or 1hour write of csv file.

1. Data File is "2008.csv"
2. Data file download http://stat-computing.org/dataexpo/2009/the-data.html
3. Code https://gist.github.com/bigsnarfdude/581b780ce85d7aaecbcb

High CPU and 58 minute average completion time MACOSX 10.11.2
Macbook Pro 16g - 2.2 GHz Intel Core i7 -1TB
1.  spark-assembly-2.0.0-SNAPSHOT-hadoop2.4.0.jar spark-csv_2.11-1.4
https://gist.github.com/bigsnarfdude/581b780ce85d7aaecbcb

High CPU and waited over hour for csv write but didnt wait to complete 
Ubuntu14.04
1.  spark-assembly-2.0.0-SNAPSHOT-hadoop2.4.0.jar spark-csv_2.11-1.4
https://gist.github.com/bigsnarfdude/930f5832c231c3d39651



> Dataframe to csv IO, system performance enters high CPU state and write 
> operation takes 1 hour to complete
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-14031
>                 URL: https://issues.apache.org/jira/browse/SPARK-14031
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Shell
>    Affects Versions: 2.0.0
>         Environment: MACOSX 10.11.2 Macbook Pro 16g - 2.2 GHz Intel Core i7 
> -1TB and Ubuntu14.04 Vagrant 4 Cores 8g
>            Reporter: Vincent Ohprecio
>            Priority: Minor
>
> Summary
> When in spark-shell trying to write out results of dataframe to csv, system 
> performance enters high CPU state and write operation takes 1 hour to 
> complete. 
> Affecting:
>  [Stage 5:>                                                         (0 + 2) / 
> 21]
> Recreate High CPU averaging 3488272270000ns or 1hour write of csv file.
> 1. Data File is "2008.csv"
> 2. Data file download http://stat-computing.org/dataexpo/2009/the-data.html
> 3. Code https://gist.github.com/bigsnarfdude/581b780ce85d7aaecbcb
> High CPU and 58 minute average completion time MACOSX 10.11.2
> Macbook Pro 16g - 2.2 GHz Intel Core i7 -1TB
> 1.  spark-assembly-2.0.0-SNAPSHOT-hadoop2.4.0.jar spark-csv_2.11-1.4
> https://gist.github.com/bigsnarfdude/581b780ce85d7aaecbcb
> High CPU and waited over hour for csv write but didnt wait to complete 
> Ubuntu14.04
> 1.  spark-assembly-2.0.0-SNAPSHOT-hadoop2.4.0.jar spark-csv_2.11-1.4
> https://gist.github.com/bigsnarfdude/930f5832c231c3d39651



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-14031) Dataframe to csv IO, system performance enters high CPU state and write operation takes 1 hour to complete

Reply via email to