This is an automated email from the ASF dual-hosted git repository.
gangwu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/parquet-java.git
The following commit(s) were added to refs/heads/master by this push:
new 3ac860e11 GH-2994: Optimize string to binary conversion in
AvroWriteSupport (#2995)
3ac860e11 is described below
commit 3ac860e1145c0fba1cf3b902c943f1703dd9db52
Author: sschepens <[email protected]>
AuthorDate: Wed Aug 28 11:58:54 2024 -0300
GH-2994: Optimize string to binary conversion in AvroWriteSupport (#2995)
`Binary.fromCharSequence` is an order of magnitude slower than
`Binary.fromString` when input is a `String`:
```
Benchmarks.fromCharSequence thrpt 25 5885347.328 ± 186669.738 ops/s
Benchmarks.fromString thrpt 25 71335979.492 ± 8800704.044 ops/s
```
Here is the code for the benchmarks:
```java
public class Benchmarks {
private static final String string =
RandomStringUtils.randomAlphanumeric(100);
@Benchmark
@BenchmarkMode(Mode.Throughput)
public void fromCharSequence(Blackhole blackhole) {
blackhole.consume(Binary.fromCharSequence(string));
}
@Benchmark
@BenchmarkMode(Mode.Throughput)
public void fromString(Blackhole blackhole) {
blackhole.consume(Binary.fromString(string));
}
}
```
---
.../src/main/java/org/apache/parquet/avro/AvroWriteSupport.java | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git
a/parquet-avro/src/main/java/org/apache/parquet/avro/AvroWriteSupport.java
b/parquet-avro/src/main/java/org/apache/parquet/avro/AvroWriteSupport.java
index 846fb8bab..53fc3d59c 100644
--- a/parquet-avro/src/main/java/org/apache/parquet/avro/AvroWriteSupport.java
+++ b/parquet-avro/src/main/java/org/apache/parquet/avro/AvroWriteSupport.java
@@ -403,10 +403,12 @@ public class AvroWriteSupport<T> extends WriteSupport<T> {
if (value instanceof Utf8) {
Utf8 utf8 = (Utf8) value;
return Binary.fromReusedByteArray(utf8.getBytes(), 0,
utf8.getByteLength());
+ } else if (value instanceof String) {
+ return Binary.fromString((String) value);
} else if (value instanceof CharSequence) {
return Binary.fromCharSequence((CharSequence) value);
}
- return Binary.fromCharSequence(value.toString());
+ return Binary.fromString(value.toString());
}
private static GenericData getDataModel(ParquetConfiguration conf, Schema
schema) {