This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch branch-2.0
in repository https://gitbox.apache.org/repos/asf/orc.git
The following commit(s) were added to refs/heads/branch-2.0 by this push:
new dfd41c4b2 ORC-1704: Migration to Scala 2.13 of Apache Spark 3.5.1 at
SparkBenchmark
dfd41c4b2 is described below
commit dfd41c4b270694492be37c5339db8197b137ffd6
Author: sychen <[email protected]>
AuthorDate: Wed Apr 24 21:11:32 2024 -0700
ORC-1704: Migration to Scala 2.13 of Apache Spark 3.5.1 at SparkBenchmark
### What changes were proposed in this pull request?
This PR aims to migrate to Scala 2.13 of Apache Spark 3.5.1 at
SparkBenchmark.
### Why are the changes needed?
https://github.com/apache/orc/pull/1909#pullrequestreview-2020282867
### How was this patch tested?
local test
```bash
java -jar spark/target/orc-benchmarks-spark-2.1.0-SNAPSHOT.jar spark data
-format=parquet -compress zstd -data taxi
```
```
Benchmark (compression) (dataset)
(format) Mode Cnt Score Error Units
SparkBenchmark.partialRead zstd taxi
parquet avgt 5 17211.731 ± 11836.315 us/op
SparkBenchmark.partialRead:bytesPerRecord zstd taxi
parquet avgt 5 0.002 #
SparkBenchmark.partialRead:ops zstd taxi
parquet avgt 5 10.000 #
SparkBenchmark.partialRead:perRecord zstd taxi
parquet avgt 5 0.001 ± 0.001 us/op
SparkBenchmark.partialRead:records zstd taxi
parquet avgt 5 113791180.000 #
```
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #1912 from cxzl25/ORC-1704.
Authored-by: sychen <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit dc634cbb0332999cfc4cac61efb5dbcaa1b5a13d)
Signed-off-by: Dongjoon Hyun <[email protected]>
---
java/bench/pom.xml | 12 +++++++-----
java/bench/spark/pom.xml | 8 ++++----
.../src/java/org/apache/orc/bench/spark/SparkBenchmark.java | 2 +-
3 files changed, 12 insertions(+), 10 deletions(-)
diff --git a/java/bench/pom.xml b/java/bench/pom.xml
index a59127a6b..f575f9780 100644
--- a/java/bench/pom.xml
+++ b/java/bench/pom.xml
@@ -39,6 +39,8 @@
<junit.version>5.10.1</junit.version>
<orc.version>${project.version}</orc.version>
<parquet.version>1.13.1</parquet.version>
+ <scala.binary.version>2.13</scala.binary.version>
+ <scala.version>2.13.8</scala.version>
<spark.version>3.5.1</spark.version>
</properties>
@@ -284,12 +286,12 @@
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
- <artifactId>spark-catalyst_2.12</artifactId>
+ <artifactId>spark-catalyst_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
- <artifactId>spark-core_2.12</artifactId>
+ <artifactId>spark-core_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<exclusions>
<exclusion>
@@ -316,7 +318,7 @@
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
- <artifactId>spark-sql_2.12</artifactId>
+ <artifactId>spark-sql_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<exclusions>
<exclusion>
@@ -335,7 +337,7 @@
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
- <artifactId>spark-avro_2.12</artifactId>
+ <artifactId>spark-avro_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
@@ -357,7 +359,7 @@
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
- <version>2.12.18</version>
+ <version>${scala.version}</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
diff --git a/java/bench/spark/pom.xml b/java/bench/spark/pom.xml
index 46a6486a2..38a1193f1 100644
--- a/java/bench/spark/pom.xml
+++ b/java/bench/spark/pom.xml
@@ -71,15 +71,15 @@
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
- <artifactId>spark-catalyst_2.12</artifactId>
+ <artifactId>spark-catalyst_${scala.binary.version}</artifactId>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
- <artifactId>spark-core_2.12</artifactId>
+ <artifactId>spark-core_${scala.binary.version}</artifactId>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
- <artifactId>spark-sql_2.12</artifactId>
+ <artifactId>spark-sql_${scala.binary.version}</artifactId>
</dependency>
<dependency>
<groupId>org.apache.parquet</groupId>
@@ -88,7 +88,7 @@
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
- <artifactId>spark-avro_2.12</artifactId>
+ <artifactId>spark-avro_${scala.binary.version}</artifactId>
</dependency>
<dependency>
<groupId>org.jodd</groupId>
diff --git
a/java/bench/spark/src/java/org/apache/orc/bench/spark/SparkBenchmark.java
b/java/bench/spark/src/java/org/apache/orc/bench/spark/SparkBenchmark.java
index 90eceee98..b390843a2 100644
--- a/java/bench/spark/src/java/org/apache/orc/bench/spark/SparkBenchmark.java
+++ b/java/bench/spark/src/java/org/apache/orc/bench/spark/SparkBenchmark.java
@@ -61,9 +61,9 @@ import scala.Function1;
import scala.Tuple2;
import scala.collection.Iterator;
import scala.collection.JavaConverters;
-import scala.collection.Seq;
import scala.collection.immutable.Map;
import scala.collection.immutable.Map$;
+import scala.collection.immutable.Seq;
import java.io.IOException;
import java.sql.Timestamp;