zanmato1984 commented on issue #41094: URL: https://github.com/apache/arrow/issues/41094#issuecomment-2144240210
> In the implementation, maybe we can first implement gather-scatter? Since Arrow doesn't gurantee a data type supports un-ordered write currently ( which is different from velox ) 1) According to @felipecrv , we don't have an existing "scatter" function, so yes we'll have to implement that first. 2) When this "scatter" function is available, we can build a "naive" evaluation of special form on top of it (as well as "gather") - by doing a centralized "gather the input by the selection vector, pass the selected rows to the dumb kernel, and scatter the kernel's output back to the actual output". This requires NO kernel's awareness of selection vector, making available of an incremental approach of 3. 3) Gradually make kernel support selection vector as an optional parameter, i.e., selection-vector-aware. Once all done, we don't need 2 any more. I think both 2 and 3 could potentially benefit from leveraging special attributes of specific data types such as list/string-view, ree and dict, though I'm not exactly sure how. I'm now working on an overall framework, maybe things will become clearer when I get there. I can use some help/comment from you guys then :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
