Hi,

Thanks for your proposal!

In <camexywdtj_k9_xgycdcprbpufdj5xbsmb4fnmykobeg+6oa...@mail.gmail.com>
  "[DISCUSS][Erlang] Erlang Apache Arrow Implementation" on Fri, 8 Aug 2025 
21:41:54 +0530,
  Benjamin Philip <benjamin.philip...@gmail.com> wrote:

> I am working on an Erlang implementation for Apache Arrow, and I am
> interested in submitting it to the Apache Foundation as an official
> implementation for Erlang and Elixir, once it is ready.

If we develop it as out of the Apache Software Foundation
(https://github.com/apache/), we need to donate it to the
Apache Software Foundation. See also
https://incubator.apache.org/ip-clearance/ and David's
reply.

In the donation process, all copyright holders must
sign contributor license agreement. See also:
https://www.apache.org/licenses/contributor-agreements.html

If we create a new repository such as
https://github.com/apache/arrow-erlang and develop in it
from scratch, we don't need the donation process.

> Initial work[4] was started 2 years ago for compliance with some new
> OpenTelemetry specifications. However, my focus so far has only been
> (de)serialization and not operating on/manipulating Arrow Arrays since that
> was the only requirement in OpenTelemetry.
> 
> The trouble with Erlang, is that natively producing and decoding binaries
> in pure Erlang is more effective than through a C FFI. This has also been
> the case with plaintext formats like JSON and XML, and with parsing markup
> like HTML and Markdown. This has meant that we've had to write an Erlang
> Arrow implementation from the ground up. The lack of an Erlang flatbuffer
> implementation (for IPC), SIMD support in the Erlang Virtual Machine (for
> efficient operations) and mutability (for zero-copy access; all values in
> Erlang are immutable) make a complete Arrow implementation in Erlang
> especially challenging.

Could you upstream the FlatBuffers part to
https://github.com/google/flatbuffers instead of maintaining
it by us? See also the David's reply.

> An alternative could be to handle serializations in Erlang and operations
> with the C bindings. We could also start with a minimal implementation with
> bindings to nanoarrow and deprecate that in favour of the Erlang one later.

It seems that there are Rust codes in your repository:
https://github.com/Benjamin-Philip/serde_arrow/tree/main/native/arrow_format_nif

How about starting a new implementation as arrow-rs
bindings? nanoarrow provides only
serialization/deserialization features but arrow-rs provides
more features such as computation features.

> Upstreaming a fully compliant Erlang implementation could potentially be a
> multi-year project. This might also include writing an Erlang flatbuffers
> implementation. This will also be an additional implementation for the
> Arrow team to maintain, though I would be happy to aid in developing and
> maintaining it. What are the steps to get this going?

See the above comment from me.

> How are implementations out of the mono repo tested? Is there any guide for
> setting up integration testing and benchmarking in third-party
> implementations? So far I've had to roll my own minimal tooling for what
> archery supports, and I would prefer if I could integrate with
> archery instead.

We need to add a tester to Archery. For example,
https://github.com/apache/arrow/blob/main/dev/archery/archery/integration/tester_go.py
is a tester for Go.

FYI: There is a PR that adds a generic tester for
implementations not in apache/arrow:
https://github.com/apache/arrow/pull/46530

> Additionally, the initial work for this project was sponsored by the Erlang
> Ecosystem Foundation[5]. Would this be an issue when transferring
> stewardship to the ASF?

If the Erlang Ecosystem Foundation is also a copyright
holder and we choose the donation process, the Erlang
Ecosystem Foundation also needs sign contributor license
agreement.


Thanks,
-- 
kou

Reply via email to