[ https://issues.apache.org/jira/browse/SPARK-53082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun updated SPARK-53082: ---------------------------------- Description: `Files.readString` is roughly 3 times faster. **BEFORE** {code} scala> spark.time(org.apache.commons.io.FileUtils.readFileToString(new java.io.File("/tmp/500000000_byte")).length) Time taken: 347 ms {code} **AFTER** {code} scala> spark.time(java.nio.file.Files.readString(java.nio.file.Path.of("/tmp/500000000_byte")).length) Time taken: 144 ms {code} > Use Java `Files.readString` instead of `FileUtils.readFileToString` > ------------------------------------------------------------------- > > Key: SPARK-53082 > URL: https://issues.apache.org/jira/browse/SPARK-53082 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL, Tests > Affects Versions: 4.1.0 > Reporter: Dongjoon Hyun > Assignee: Dongjoon Hyun > Priority: Major > Labels: pull-request-available > Fix For: 4.1.0 > > > `Files.readString` is roughly 3 times faster. > **BEFORE** > {code} > scala> spark.time(org.apache.commons.io.FileUtils.readFileToString(new > java.io.File("/tmp/500000000_byte")).length) > Time taken: 347 ms > {code} > **AFTER** > {code} > scala> > spark.time(java.nio.file.Files.readString(java.nio.file.Path.of("/tmp/500000000_byte")).length) > Time taken: 144 ms > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org