[ 
https://issues.apache.org/jira/browse/DRILL-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15687300#comment-15687300
 ] 

Jacques Nadeau commented on DRILL-4455:
---------------------------------------

As someone who worked a lot on the memory layer and accounting stuff, I'm not 
sure how one would split it without introducing a level of indirection that 
would impact performance. The problem has to do with the ability to transfer 
data accounting that exists within the memory buffers and trying to do that 
while maintaining a single canonical memory representation and supporting 
limits. For reference, please review the information at [1] to understand how 
the pieces work together.

We have two challenges I see at this point. 

- This was originally proposed in November of 15. Note the attached slides in 
[2], specifically the last one where all three approaches included the vectors 
and memory management moving together in the project (due to the nature of the 
coupling). Not hearing any disagreement and then going through the massive 
amount of work that this patch took to build and then hitting a -1 6 months 
later takes a lot of wind out of one's sails. 

- The larger problem is I'm not sure who is going to have the interest to try 
to do this patch again. We're now ~6 months later with two trees that have 
moved in their own directions. Rebase is probably very difficult (or 
impossible). My sense is that Arrow will continue to create value and at some 
point, the Drill community will achieve a consensus that it is valuable to do 
this work. In the meantime, I'm not sure anyone's heart is in it right now. 

So while it may make sense to ultimately try to come up with a better approach 
to modularity in the Arrow library around the first point, I'd like to see some 
demand from the community that wants to use Arrow to do that (possibly in the 
form of patches or approaches proposed).

PS: An interesting question would be: how much development has happened in the 
"disputed module" in Drill since this patch (or since my major reworking of it 
~12 months ago). 

[1] 
https://github.com/apache/arrow/tree/master/java/memory/src/main/java/org/apache/arrow/memory
[2] http://markmail.org/thread/74ns3peuwbaolcod



> Depend on Apache Arrow for Vector and Memory
> --------------------------------------------
>
>                 Key: DRILL-4455
>                 URL: https://issues.apache.org/jira/browse/DRILL-4455
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Steven Phillips
>            Assignee: Steven Phillips
>             Fix For: 2.0.0
>
>
> The code for value vectors and memory has been split and contributed to the 
> apache arrow repository. In order to help this project advance, Drill should 
> depend on the arrow project instead of internal value vector code.
> This change will require recompiling any external code, such as UDFs and 
> StoragePlugins. The changes will mainly just involve renaming the classes to 
> the org.apache.arrow namespace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to