josiahyan commented on pull request #8214:
URL: https://github.com/apache/arrow/pull/8214#issuecomment-696450195


   Here are the results of my testing. I'm not really that familiar with Arrow, 
and some of the code is sloppy, so please check that what I'm doing matches up 
with your expectations! I didn't check whether the benchmarks were really 
measuring something realistic either (although I have no specific reason to 
doubt that).
   
   All benchmarks were compiled and run on the repo-provided 
java-14-openjdk-amd64 on Ubuntu 20.04. I've disabled HT, CPU freq scaling, but 
did not pin CPU cores.
   
   Representative CLI invocation:
   ```
   java -Darrow.enable_unsafe_memory_access=true -jar 
../arrow/java/performance/target/benchmarks.jar 
'.*IntBenchmarks.setIntDirectly' -w 2 -r 2 -f 2`
   
   # JMH version: 1.21
   # VM version: JDK 14.0.1, OpenJDK 64-Bit Server VM, 14.0.1+7-Ubuntu-1ubuntu1
   # VM invoker: /usr/lib/jvm/java-14-openjdk-amd64/bin/java
   # VM options: -Darrow.enable_unsafe_memory_access=true
   # Warmup: 5 iterations, 2 s each
   # Measurement: 5 iterations, 2 s each
   # Timeout: 10 min per iteration
   # Threads: 1 thread, will synchronize iterations
   # Benchmark mode: Average time, time/op
   # Benchmark: org.apache.arrow.vector.IntBenchmarks.setIntDirectly
   ```
   
   * Option 1: the original IntBenchmarks
   * Option 2: save the value/validity buffer capacities in a wrapper around 
IntVector, and then manually resize, calling .set
   * Option 3: save the overall value capacity in 
BaseFixedWidthVector.getValueCapacity, and force recomputation based on 
valueBuffer.capacity() and validityBuffer.capacity() (what I understand of 
@liyafan82 's suggestion that avoids invalidation issues around the underlying 
buffers)
   * Option 4: Call the despecialized putLong directly (The code that was 
linked in Iceberg, with more code to handle validity buffer)
   
   master w/o this patch: `IntBenchmarks.setIntDirectly  avgt   10  11.605 ± 
0.433  us/op`
   Option 1: `IntBenchmarks.setIntDirectly   avgt   10  4.670 ± 0.095  us/op`
   Option 2: `IntBenchmarks.setIntDirectlyWithSpecializedWriterCachedCapacities 
 avgt   10  2.228 ± 0.073  us/op`
   Option 3: `IntBenchmarks.setIntDirectly  avgt   10  3.446 ± 0.022  us/op`
   Option 4: `IntBenchmarks.setIntDirectlyDirectlyWithPutInt  avgt   10  1.887 
± 0.009  us/op`
   
   Code for Option 1:
   This patch, with the master benchmark code, as extracted:
   ```
   +@State(Scope.Benchmark)
   +public class IntBenchmarks {
   +
   +  private static final int VECTOR_LENGTH = 1024;
   +
   +  private static final int ALLOCATOR_CAPACITY = 1024 * 1024;
   +
   +  private BufferAllocator allocator;
   +
   +  private IntVector vector;
   +
   +  @Setup
   +  public void prepare() {
   +    allocator = new RootAllocator(ALLOCATOR_CAPACITY);
   +    vector = new IntVector("vector", allocator);
   +    vector.allocateNew(VECTOR_LENGTH);
   +    vector.setValueCount(VECTOR_LENGTH);
   +  }
   ...
   +  @Benchmark
   +  @BenchmarkMode(Mode.AverageTime)
   +  @OutputTimeUnit(TimeUnit.MICROSECONDS)
   +  public void setIntDirectly() {
   +    for (int i = 0; i < VECTOR_LENGTH; i++) {
   +      vector.setSafe(i, i % 3 == 0 ? 0 : 1, i);
   +    }
   +  }
   ```
   
   Code for Option 2:
   ```
   diff --git 
a/java/performance/src/main/java/org/apache/arrow/vector/CapacityCachingIntVector.java
 
b/java/performance/src/main/java/org/apache/arrow/vector/CapacityCachingIntVector.java
   new file mode 100644
   index 000000000..c4201dea8
   --- /dev/null
   +++ 
b/java/performance/src/main/java/org/apache/arrow/vector/CapacityCachingIntVector.java
   @@ -0,0 +1,27 @@
   +package org.apache.arrow.vector;
   +
   +public class CapacityCachingIntVector {
   +    private final IntVector backingVector;
   +    private long backingVectorCachedCapacity;
   +
   +    public CapacityCachingIntVector(IntVector backing) {
   +        this.backingVector = backing;
   +        synchronizeCapacity();
   +    }
   +
   +    public void setSafe(int index, int isSet, int value) {
   +        if (index >= backingVectorCachedCapacity) {
   +            // unlikely
   +            this.backingVector.setSafe(index, isSet, value);  // force a 
resize
   +            synchronizeCapacity();
   +            return;
   +        }
   +        // likely
   +        this.backingVector.set(index, isSet, value);
   +        return;
   +    }
   +
   +    private void synchronizeCapacity() {
   +        backingVectorCachedCapacity = backingVector.getValueCapacity();
   +    }
   +}
   ```
   
   Code for Option 3:
   ```
   @@ -182,11 +187,19 @@ public abstract class BaseFixedWidthVector extends 
BaseValueVector
       */
      @Override
      public int getValueCapacity() {
   -    return Math.min(getValueBufferValueCapacity(), 
getValidityBufferValueCapacity());
   +    final long valueBufferCapacity = valueBuffer.capacity();
   +    final long validityBufferCapacity = validityBuffer.capacity();
   +    if (underlyingValueBufferCapacityCached != valueBufferCapacity || 
underlyingValidityBufferCapacityCached != validityBufferCapacity) {
   +      // trap modification
   +      underlyingValueBufferCapacityCached = valueBufferCapacity;
   +      underlyingValidityBufferCapacityCached = validityBufferCapacity;
   +      valueCapacityCached = Math.min(getValueBufferValueCapacity(), 
getValidityBufferValueCapacity());
   +    }
   +    return valueCapacityCached;
      }
    
      private int getValueBufferValueCapacity() {
   -    return capAtMaxInt(valueBuffer.capacity() / typeWidth);
   +      return capAtMaxInt(valueBuffer.capacity() / typeWidth);
      }
   ```
   
   Code for Option 4:
   ```
   +  @Benchmark
   +  @BenchmarkMode(Mode.AverageTime)
   +  @OutputTimeUnit(TimeUnit.MICROSECONDS)
   +  public void setIntDirectlyDirectlyWithPutInt() {
   +    ArrowBuf dataBuffer = vector.getDataBuffer();
   +    ArrowBuf validityBuffer = vector.getValidityBuffer();
   +    long validityBatch = 0;
   +    int i;
   +    for (i = 0; i < VECTOR_LENGTH; i++) {
   +      if (i % 64 == 0 && i != 0) {
   +        // time to flush to validity buffer
   +        // this implementation does not consider endianness, it only 
emulates flushing an incrementally built bit vector
   +        validityBuffer.setLong((i - 1)>> 3, validityBatch);
   +      }
   +      dataBuffer.setInt(IntVector.TYPE_WIDTH * i, i);
   +      validityBatch = (validityBatch << 1) | (i % 3 == 0 ? 0 : 1);
   +    }
   +    if (i % 64 > 0) {
   +      // time to flush to validity buffer
   +      // this implementation does not consider endianness or OOB access, it 
only emulates flushing an incrementally built bit vector
   +      validityBuffer.setLong(i >> 3, validityBatch);
   +    }
   +  }
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to