Re: [DISCUSS] raw pointers and FFI (C-level in-process array protocol)

Sutou Kouhei Sat, 05 Oct 2019 05:44:05 -0700

Hi,

I think that FFI use is misleading. Normally, language
bindings for this API are useless for processing Apache
Arrow data. Because these bindings of this API can only
import/export Apache Arrow data. Target language may not
have useful/fast API for processing imported Apache Arrow
data. For example, Julia may process imported Apache Arrow
data with Julia's built-in feature. Other script
languages may not, even LuaJIT.


We need multiple languages in one process for in-process
use. There are some approaches for this situation. Actually
some approaches are used but these approaches are minor. (I
think.)


I think that interacting to Apache Arrow ready library is a
useful use case of this API.

If SQLite uses this API to return result set in Apache Arrow
format, it'll be useful. SQLite doesn't need additional
dependency to add support for exporting in Apache Arrow
format. SQLite will return schema by its existing API such
as sqlite3_column_type() and return data with this API.
SQLite bindings can add Apache Arrow data export API easily
because it's just raw C API. (FFI may be used to bind the
Apache Arrow data export API.)

SQLite doesn't need to process Apache Arrow data. It just
exports Apache Arrow data. So this API is enough.


This API will be useful for libraries that want to support
just Apache Arrow data import/export.


Thanks,
--
kou

In <cajpuwmd0cvxm2z3c61ab9jk-odzd1toeytlqwuzjrg3r+2d...@mail.gmail.com>
  "Re: [DISCUSS] raw pointers and FFI (C-level in-process array protocol)" on 
Thu, 3 Oct 2019 11:17:29 -0500,
  Wes McKinney <[email protected]> wrote:

> Related: Gandiva invented its own particular way of passing memory
> addresses through the JNI boundary rather than using Flatbuffers
> messages
> 
> https://github.com/apache/arrow/blob/master/cpp/src/gandiva/jni/jni_common.cc#L505
> 
> I'm all for language-agnostic in-memory data passing, but there is a
> use case for a C API to pass pointers at call sites while avoiding
> flattening (disassembly) and unflattening (reassembly) steps.
> 
> On Thu, Oct 3, 2019 at 4:34 AM Antoine Pitrou <[email protected]> wrote:
>>
>>
>> Hi Jacques,
>>
>> Le 03/10/2019 à 02:46, Jacques Nadeau a écrit :
>> >
>> > I think it is reasonable to argue that keeping any ABI (or header/struct
>> > pattern) as narrow as possible would allow us to minimize overlap with the
>> > existing in-memory specification. In Arrow's case, this could be as simple
>> > as a single memory pointer for schema (backed by flatbuffers) and a single
>> > memory location for data (that references the record batch header, which in
>> > turn provides pointers into the actual arrow data). [...]
>> >
>> > [...] (For example, in a JVM
>> > view of the world, working with a plain struct in java rather than a set of
>> > memory pointers against our existing IPC formats would be quite painful and
>> > we'd definitely need to create some glue code for users. I worry the same
>> > pattern would occur in many other languages.)
>>
>> I'm trying to understand the point you're making.  Here you say that it
>> was difficult for the JVM to deal with raw pointers.  But above you seem
>> to argue for a flatbuffers-based serialization containing raw pointers.
>>
>> Here's another way to frame the question: how do you propose to do
>> zero-copy between different languages if not by passing raw pointers to
>> the Arrow data?  And if passing raw pointers is acceptable, what is
>> wrong with the spec as proposed?
>>
>>
>> As for creating glue code: yes, of course, that would be needed in most
>> languages that want to provide this interface (including C++).  You do
>> need a C FFI for that.  I'm quite sure it would be possible to implement
>> this proposal in pure Python with ctypes / cffi, for example (as a toy
>> example, since PyArrow exists :-)).  When writing the spec, I also took
>> a look at the Go and Rust FFIs, and they seem good enough to interact
>> with it.  I tried to take a look at JNI, but of course I got lost in the
>> documentation :-)
>>
>> If you are worried that people start thinking that this proposal is part
>> of the Arrow specification, perhaps we can make it clear that exposing
>> this interface is optional for implementations.
>>
>> Regards
>>
>> Antoine.

Re: [DISCUSS] raw pointers and FFI (C-level in-process array protocol)

Reply via email to