Github user sachouche commented on the issue: https://github.com/apache/drill/pull/1060 Before I reply to the provided comments I want first to thank both Parth and Paul for taking time to review this Pull Request. @parthchandra Regarding the High Level Comments - FS Access Manager The attached document listed several possible improvements in terms of CPU, Partitioning (changes to the query plan), and IO layer. I didn't want to club all the changes in a single pull request; instead I started with the Pull Request with the highest priority (in terms of performance improvement) which is CPU related enhancements. - Drill Memory Package I understand where you're coming from; I have analyzed this task and came up with a solution which I felt was the cleanest. Let me take you through my thinking process and the alternative solutions that I have considered: a) Why did I use this low level APIs? The Bulk Parquet reader uses the Drill Buffer APIs to copy data chunks to the CPU cache; this means that memory sanity checks are always performed. During code profiling I have noticed the Hotspot was using too many instructions (byte based assembly was used); I have researched ways to improve this and found some Hadoop and Compression java code which utilized 64bit access instructions when manipulating byte arrays (e.g., [Snappy](https://github.com/airlift/aircompressor/blob/master/src/main/java/io/airlift/compress/snappy/SnappyRawCompressor.java)). This makes a huge difference b) Safety As indicated previously, native data is always accessed through the Drill Buffer APIs (that is, accessed data is sanitized). During bulk processing, the new utility has additional checks which are triggered when assertions are enabled. This means all test-suite and QA tests (which run with assertions On) will be able to catch any faulty memory access; note also the checks are simpler since they only pertain to byte arrays boundary checks. c) Abstraction I agree that the Drill code base should follow a common code path when accessing native memory (minimize chances of introducing bugs), though I felt the Drill Memory package was the main abstraction since it was centralized and thus easier to test. d) Alternatives I have considered using the Netty io.netty.util.internal.PlatformDependent0 to access the Java intrinsics APIs though this class has a package scope and the wrapper public class io.netty.util.internal.PlatformDependent doesn't expose this low level functionality. Thus the work around would have been to create MemoryUtils class under the io.netty.util.internal package. I didn't think this was a cleaner approach than just using Java's intrinsic APIs directly (note also that Java9 will expose such intrinsics under a public JDK package instead of the sun.misc. Please let me know if you rather prefer the io.netty.util.internal route instead. - Changes to the Vector Code I had few discussions with Paul about the need to reconcile DRILL-5657; the agreement was to proceed with the current implementation and then apply refactoring (future releases) as my changes are backward compatible (new public APIs added). - Arrow Integration I will create a task to analyze the impact of the Arrow Integration on the Parquet Reader. - UserBitShared.proto This class was updated to deal with the implicit columns optimization. I have added two fields to the NullableVarChar class to capture the logical number of rows and the fact this feature was active. This allowed me to use different Accessor and Mutator implementation. You and I had a discussion about this and the agreement was to: a) The new optimization is configurable (current default is off); though we might change the default at some point b) Detect C++ based clients and turn of this optimization (that is, the implicit columns enhancements); I didn't implement this task yet as I forgot the name of the property that you have indicated (this should be quite fast to implement though) - Testing Yes, QA helped me to run the test-suite with the two new options enabled. The new changes didn't modify any of the encoding/compression logic but I'll be happy to add any test(s) you deem necessary.. - Code Style Thank you for the pointers; I'll stick to your feedback in future changes to ease the review process.
---