[
https://issues.apache.org/jira/browse/ARROW-263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Micah Kornfield updated ARROW-263:
----------------------------------
Assignee: (was: Micah Kornfield)
> Design an initial IPC mechanism for Arrow Vectors
> -------------------------------------------------
>
> Key: ARROW-263
> URL: https://issues.apache.org/jira/browse/ARROW-263
> Project: Apache Arrow
> Issue Type: New Feature
> Reporter: Micah Kornfield
>
> Prior discussion on this topic [1].
> Use-cases:
> 1. User defined function (UDF) execution: One process wants to execute a
> user defined function written in another language (e.g. Java executing a
> function defined in python, this involves creating Arrow Arrays in java,
> sending them to python and receiving a new set of Arrow Arrays produced in
> python back in the java process).
> 2. If a storage system and a query engine are running on the same host we
> might want use IPC instead of RPC (e.g. Apache Drill querying Apache Kudu)
> Assumptions:
> 1. IPC mechanism should be useable from the core set of supported languages
> (Java, Python, C) on POSIX and ideally windows systems. Ideally, we would
> not need to add dependencies on additional libraries outside of each
> languages outside of this document.
> We want leverage shared memory for Arrays to avoid doubling RAM requirements
> by duplicating the same Array in different memory locations.
> 2. Under some circumstances shared memory might be more efficient than FIFOs
> or sockets (in other scenarios they won’t see thread below).
> 3. Security is not a concern for V1, we assume all processes running are
> “trusted”.
> Requirements:
> 1.Resource management:
> a. Both processes need a way of allocating memory for Arrow Arrays so
> that data can be passed from one process to another.
> b. There must be a mechanism to cleanup unused Arrow Arrays to limit
> resource usage but avoid race conditions when processing arrays
> 2. Schema negotiation - before sending data, both processes need to agree on
> schema each one will produce.
> Out of scope requirements:
> 1. IPC channel metadata discovery is out of scope of this document.
> Discovery can be provided by passing appropriate command line arguments,
> configuration files or other mechanisms like RPC (in which case RPC channel
> discovery is still an issue).
> [1]
> http://mail-archives.apache.org/mod_mbox/arrow-dev/201603.mbox/%3c8d5f7e3237b3ed47b84cf187bb17b666148e7...@shsmsx103.ccr.corp.intel.com%3E
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)