[ 
https://issues.apache.org/jira/browse/SPARK-53075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-53075:
----------------------------------
    Description: 

Java implementations are faster.

**SAMPLE DATA**

{code}
scala> val array = new java.util.ArrayList[String]()
val array: java.util.ArrayList[String] = []

scala> (1 to 100_000_000).foreach { _ => array.add("a") }
{code}

**BEFORE (WRITE)**

{code}
scala> spark.time(org.apache.commons.io.FileUtils.writeLines(new 
java.io.File("/tmp/text"), array))
Time taken: 5013 ms
{code}

**AFTER (WRITE)**

{code}
scala> 
spark.time(java.nio.file.Files.write(java.nio.file.Paths.get("/tmp/text"), 
array))
Time taken: 1191 ms
{code}

**BEFORE(READ)**

{code}
scala> spark.time(org.apache.commons.io.FileUtils.readLines(new 
java.io.File("/tmp/text")))
Time taken: 2377 ms
{code}

**AFTER(READ)**

{code}
scala> 
spark.time(java.nio.file.Files.readAllLines(java.nio.file.Paths.get("/tmp/text")))
Time taken: 2279 ms
{code}

> Use Java `Files.readAllLines/write` instead of `FileUtils.(read|write)Lines`
> ----------------------------------------------------------------------------
>
>                 Key: SPARK-53075
>                 URL: https://issues.apache.org/jira/browse/SPARK-53075
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Spark Core
>    Affects Versions: 4.1.0
>            Reporter: Dongjoon Hyun
>            Assignee: Dongjoon Hyun
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.1.0
>
>
> Java implementations are faster.
> **SAMPLE DATA**
> {code}
> scala> val array = new java.util.ArrayList[String]()
> val array: java.util.ArrayList[String] = []
> scala> (1 to 100_000_000).foreach { _ => array.add("a") }
> {code}
> **BEFORE (WRITE)**
> {code}
> scala> spark.time(org.apache.commons.io.FileUtils.writeLines(new 
> java.io.File("/tmp/text"), array))
> Time taken: 5013 ms
> {code}
> **AFTER (WRITE)**
> {code}
> scala> 
> spark.time(java.nio.file.Files.write(java.nio.file.Paths.get("/tmp/text"), 
> array))
> Time taken: 1191 ms
> {code}
> **BEFORE(READ)**
> {code}
> scala> spark.time(org.apache.commons.io.FileUtils.readLines(new 
> java.io.File("/tmp/text")))
> Time taken: 2377 ms
> {code}
> **AFTER(READ)**
> {code}
> scala> 
> spark.time(java.nio.file.Files.readAllLines(java.nio.file.Paths.get("/tmp/text")))
> Time taken: 2279 ms
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to