Hi Jacques, Thank you so much for your kind reminder.
To come up with some performance data, I have set up an environment and run some micro-benchmarks. The server runs Linux, has 64 cores and has 256 GB memory. The benchmarks are simple iterations over some double vectors (the source file is attached): @Benchmark @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.MICROSECONDS) public void testSafe() { safeSum = 0; for (int i = 0; i < VECTOR_LENGTH; i++) { safeVector.set(i, i + 10.0); safeSum += safeVector.get(i); } } @Benchmark @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.MICROSECONDS) public void testUnSafe() { unSafeSum = 0; for (int i = 0; i < VECTOR_LENGTH; i++) { unsafeVector.set(i, i + 10.0); unSafeSum += unsafeVector.get(i); } } The safe vector in the testSafe benchmark is from the original Arrow implementation, whereas the unsafe vector in the testUnsafe benchmark is based on our initial implementation in PR <https://github.com/apache/arrow/pull/4212> (This is not the final version. However, we believe much overhead has been removed). The evaluation is based on JMH framework (thanks to the suggestion from Jacques Nadeau). The benchmarks are run so many times by the framework that the effects of JIT are well considered. In the first experiment, we use the default configuration (boundary checking enabled), and the original Arrow vector is about 4 times slower: Benchmark Mode Cnt Score Error Units VectorAPIBenchmarks.testSafe avgt 5 11.546 ± 0.012 us/op VectorAPIBenchmarks.testUnSafe avgt 5 2.822 ± 0.006 us/op In the second experiment, we disable the boundary checking by JVM options: -Ddrill.enable_unsafe_memory_access=true -Darrow.enable_unsafe_memory_access=true This time, the original Arrow vector is about 30% slower: Benchmark Mode Cnt Score Error Units VectorAPIBenchmarks.testSafe avgt 5 4.069 ± 0.004 us/op VectorAPIBenchmarks.testUnSafe avgt 5 2.819 ± 0.005 us/op This is a significant improvement, about 2.84x faster compared to when bound checking is enabled. However, in our scenario, we would still chose to bypass Arrow APIs without hesitation, because such memory accesses are so frequent operations, that a 30% performance degradation will easily cause us lose edge. The results can be attributed to the following factors: 1. Although the checks have been disabled, we still need to read the flag and check it repeatedly in the Arrow APIs, which accumulates to large performance overhead. 2. There is too much code in the call stacks, compared with the unsafe API. This will lead to less efficient i-cache, even if JIT can avoids the cost of stack frames by in-lining most method code. Another, maybe separate problem is that, the flag BoundsChecking#BOUNDS_CHECKING_ENABLED is final and initialized in a static block. That means the only reliable way to override it is to override system properties in the JVM command line. However, for some scenarios, we do not have access to the command line (e.g. running Flink in Yarn). I think this deserves a separate issue. Best, Liya Fan On Mon, May 6, 2019 at 1:23 PM Jacques Nadeau <jacq...@apache.org> wrote: > > > > Maybe I need to take a closer look at how the other SQL engines are using > > Arrow. To see if they are also bypassing Arrow APIs. > > I agree that a random user should be able to protect themselves, and this > > is the utmost priority. > > > > According to my experience in Flink, JIT cannot optimize away the checks, > > and removing the checks addresses the issue. > > I want to illustrate this from two points: > > > > 1. Theoretical view point: JIT makes optimizations without changing > > semantics of the code, so it can never remove the checks without changing > > code semantics. To make it simple, if the JIT has witness the engine > > successfully processed 1,000,000 records, how can it be sure that the > > 1,000,001th record will be successful? > > > > 2. Practical view point: we have evaluated our SQL engine on TPC-H 1TB > data > > set. This is really a large number of records. So the JIT must have done > > all it could to improve the code. According to the performance results, > > however, it could not eliminate the impact caused checks. > > > > I don't think you're following my point. There are two different points it > seems like you want to discuss. Let's evaluate each separately: > > 1) Bounds checking for safety > 2) Supposed inefficiency of the call hierarchy. > > For #1 we provide a system level property that can disable these. The JVM > should succesfully optimize away this operation if that flag is set. Please > look at the JIT output to confirm whether this is true. > > For #2: We designed things to collapse so the call hierarchy shouldn't be a > problem. Please look at the JIT output to confirm. > > Please come with data around #1 and #2 to make an argument for a set of > changes. > > thanks >