[
https://issues.apache.org/jira/browse/DRILL-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15687300#comment-15687300
]
Jacques Nadeau commented on DRILL-4455:
---------------------------------------
As someone who worked a lot on the memory layer and accounting stuff, I'm not
sure how one would split it without introducing a level of indirection that
would impact performance. The problem has to do with the ability to transfer
data accounting that exists within the memory buffers and trying to do that
while maintaining a single canonical memory representation and supporting
limits. For reference, please review the information at [1] to understand how
the pieces work together.
We have two challenges I see at this point.
- This was originally proposed in November of 15. Note the attached slides in
[2], specifically the last one where all three approaches included the vectors
and memory management moving together in the project (due to the nature of the
coupling). Not hearing any disagreement and then going through the massive
amount of work that this patch took to build and then hitting a -1 6 months
later takes a lot of wind out of one's sails.
- The larger problem is I'm not sure who is going to have the interest to try
to do this patch again. We're now ~6 months later with two trees that have
moved in their own directions. Rebase is probably very difficult (or
impossible). My sense is that Arrow will continue to create value and at some
point, the Drill community will achieve a consensus that it is valuable to do
this work. In the meantime, I'm not sure anyone's heart is in it right now.
So while it may make sense to ultimately try to come up with a better approach
to modularity in the Arrow library around the first point, I'd like to see some
demand from the community that wants to use Arrow to do that (possibly in the
form of patches or approaches proposed).
PS: An interesting question would be: how much development has happened in the
"disputed module" in Drill since this patch (or since my major reworking of it
~12 months ago).
[1]
https://github.com/apache/arrow/tree/master/java/memory/src/main/java/org/apache/arrow/memory
[2] http://markmail.org/thread/74ns3peuwbaolcod
> Depend on Apache Arrow for Vector and Memory
> --------------------------------------------
>
> Key: DRILL-4455
> URL: https://issues.apache.org/jira/browse/DRILL-4455
> Project: Apache Drill
> Issue Type: Bug
> Reporter: Steven Phillips
> Assignee: Steven Phillips
> Fix For: 2.0.0
>
>
> The code for value vectors and memory has been split and contributed to the
> apache arrow repository. In order to help this project advance, Drill should
> depend on the arrow project instead of internal value vector code.
> This change will require recompiling any external code, such as UDFs and
> StoragePlugins. The changes will mainly just involve renaming the classes to
> the org.apache.arrow namespace.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)