[
https://issues.apache.org/jira/browse/DRILL-7087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807667#comment-16807667
]
Arina Ielchiieva commented on DRILL-7087:
-----------------------------------------
[~weijie] please start conversation about this on the mailing list, let's see
what community thinks about having Arrow fork. Personally I am against of
having an Arrow fork.
> Integrate Arrow's Gandiva into Drill
> ------------------------------------
>
> Key: DRILL-7087
> URL: https://issues.apache.org/jira/browse/DRILL-7087
> Project: Apache Drill
> Issue Type: Improvement
> Components: Execution - Codegen, Execution - Relational Operators
> Reporter: weijie.tong
> Assignee: weijie.tong
> Priority: Major
>
> It's a prior work to integrate arrow into drill by invoking the its gandiva
> feature. Comparing arrow and drill 's in memory column representation ,
> there's different null representation internal now. Drill use 1 byte while
> arrow using 1 bit to indicate one null row. Also all columns of arrow is
> nullable now. Apart from those basic differences , they have same memory
> representation to the different data types.
> The integrating strategy is to invoke arrow's JniWrapper's native method
> directly by passing the ValueVector's memory address.
> I have done a implementation at our own Drill version by integrating gandiva
> into Drill's project operator. The performance shows that there's nearly 1
> times performance gain at expression computation.
> So if there's no objection , I will submit a related PR to contribute this
> feature. Also this issue waits for arrow's related issue[ARROW-4819].
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)