[
https://issues.apache.org/jira/browse/ARROW-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16291373#comment-16291373
]
ASF GitHub Bot commented on ARROW-1922:
---------------------------------------
icexelloss commented on a change in pull request #1419: ARROW-1922: Blog post
on JAVA vector changes
URL: https://github.com/apache/arrow/pull/1419#discussion_r157032717
##########
File path: site/_posts/2017-12-13-java-vector-improvements.md
##########
@@ -0,0 +1,110 @@
+---
+layout: post
+title: "Improved JAVA Vector APIs"
+excerpt: "This post describes the recent improvements in JAVA Vector code"
+date: 2017-12-13 12:50:00
+author: Siddharth Teotia
+categories: [application]
+---
+
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+
+This post gives insight into the major improvements in the JAVA implementation
+of vectors.
+
+## Design Goals
+
+1. Improved Maintainability and Extensibility.
+2. Improved heap usage.
+3. No performance overhead on hot code paths.
+
+## Background
+
+**Improved Maintainability and Extensibility**
+
+We use templates in several places for compile time JAVA code generation for
+different vector classes, readers, writers etc. Templates are helpful as the
+developers don't have to write a lot of duplicate code.
+
+However, we realized that over a period of time some specific JAVA
+templates became extremely complex with giant if-else blocks, poor code
indentation
+and documentation. All this impacted the ability to easily extend these
templates
+for adding new functionality or improving the existing infrastructure.
+
+So we evaluated the usage of templates for compile time code generation and
+decided not to use complex templates in some places by writing small amount of
+duplicate code which is elegant, well documented and extensible.
+
+**Improved Heap Usage**
+
+We did extensive memory analysis downstream in Dremio where Arrow is used
+heavily for in-memory query execution on columnar data. The general conclusion
+was that Arrow JAVA Vectors have non-negligible heap overhead and volume of
+objects was too high. There were places in code where we were creating objects
+unnecessarily and using structures that could be substituted with better
+alternatives.
+
+**No performance overhead on hot code paths**
+
+JAVA Vectors used delegation and abstraction heavily throughout the object
+hierarchy. The performance critical get/set methods of vectors went through
+a chain of function calls back and forth between different objects before
+doing meaningful work. We also evaluated the usage of branches in vector
+APIs and reimplemented some of them by avoiding branches completely.
+
+We took inspiration from how the JAVA memory code in ArrowBuf works. For
+all the performance critical methods, ArrowBuf bypasses all the netty object
+hierarchy, grabs the target virtual address and directly interacts with
+the memory.
+
+There were cases where branches could be avoided all together.
+
+In case of Nullable vectors, we were doing multiple checks to confirm if
+the value at a given position in the vector is NULL or not.
Review comment:
NULL -> null?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Blog post on recent improvements/changes in JAVA Vectors
> --------------------------------------------------------
>
> Key: ARROW-1922
> URL: https://issues.apache.org/jira/browse/ARROW-1922
> Project: Apache Arrow
> Issue Type: Task
> Components: Java - Vectors
> Reporter: Siddharth Teotia
> Assignee: Siddharth Teotia
> Labels: pull-request-available
>
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)