Can we name it miniarrow or nanoarrow? We don't want to convey the message that there is a parallel C API for Arrow.


Le 15/06/2022 à 05:18, Dewey Dunnington a écrit :
Hi all,

I drafted a second PR [1] drafting a design for storing parsed information
obtained from a struct ArrowSchema (i.e., parsing the format string into
usable C structures). There are some unsolved problems that could use a
fresh perspective...all comments welcome!

[1] https://github.com/paleolimbot/arrow-c/pull/5

On Fri, Jun 10, 2022 at 12:27 PM Dewey Dunnington <de...@voltrondata.com>
wrote:

Hi all,

As promised, I converted the design document [1] into an initial PR [2].
Rather than draft the whole header, I started with README + implementations
+ testing for error handling and schema allocation (depending on feedback,
next week I will draft another reviewable chunk).

Also feel free to suggest another place to put this if one exists (the
choice to put it in its own repo was based on informal feedback that
perhaps that might be the best way to go).

[1]
https://docs.google.com/document/d/11n7ICVZO8exZ-z3GRlI26VLzKPXlYlEz5xjLl1y0ujU/edit?usp=sharing
[2] https://github.com/paleolimbot/arrow-c/pull/1/files

On Fri, Jun 3, 2022 at 12:41 PM Dewey Dunnington <de...@voltrondata.com>
wrote:

Hi all,

Based on the points raised above and a few adventures implementing some
of this in related projects, I put together a brief design document
proposing a scope and structure to perhaps solidify a few of these
discussions:
https://docs.google.com/document/d/11n7ICVZO8exZ-z3GRlI26VLzKPXlYlEz5xjLl1y0ujU/edit?usp=sharing
.

Any and all should feel free to add, rewrite, or propose a new
structure...I wrote many of the pieces for argument's sake or because
that's how I'd implemented them before.

Next week I will phrase it as a skeleton header (like the one in the
excellent ADBC design discussions) depending on feedback to keep the
discussion going!

Cheers,

-dewey

On Fri, Jun 3, 2022 at 9:57 AM Hannes Mühleisen <han...@duckdblabs.com>
wrote:

Hello List,

we at DuckDB are happy users of the Arrow C Data Interface and use it to
feed SQL queries and also use it to provide query results in Arrow format
again. It is particularly appealing to us that the interface is merely a
(C) header file that we just ship with our source code [1]. Internally,
our
implementation then constructs DuckDB internal vectors from the Arrow
format [2] or vice-versa [3].

As you can see from [2, 3] there is some complexity in getting the
conversion right, especially for more complex data types like nested
types
(list, strings). A lightweight, dependency-free library to help
constructing those would certainly be appreciated. What would also help a
lot is validation code, Arrow structures are very delicate and one wrong
pointer can lead to disaster (which is then blamed on us), so a way to
verify the structures in said lightweight library would be very helpful.

Best from Amsterdam, and Quack

Hannes

[1]

https://github.com/duckdb/duckdb/blob/master/src/include/duckdb/common/arrow.hpp
[2]
https://github.com/duckdb/duckdb/blob/master/src/function/table/arrow.cpp
[3]

https://github.com/duckdb/duckdb/blob/master/src/common/types/data_chunk.cpp


On Fri, Jun 03, 2022 at 15:34:42, Jonathan Keane <jke...@gmail.com>
wrote:

cc Hannes Mühleisen from DuckDB Labs

-Jon


On Tue, May 31, 2022 at 5:03 PM Wes McKinney <wesmck...@gmail.com>
wrote:

I'm also supportive of having a small vendorable C/C++ "Arrow
middleware" that provides:

* Schemas and types
* Columnar data structures and minimal APIs to build them and iterate
over
them
* C data interface
* Minimal validation (at the level of Validate but not ValidateFull)

I don't think it's going to be practical to try to refactor parts of
the existing Arrow C++ core to be vendorable since there are many
features / requirements (e.g. an extensible buffer and device API)
that these C++ classes include that aren't needed in this
limited-feature middleware library.

This also relates to the "Improving Arrow's database support" project
that David Li raised some time ago [1]. If we want to encourage
database driver libraries to add new APIs that emit the Arrow C
interface, we need to make it easier to generate the C interface
without requiring a new library dependency.

[1]: https://lists.apache.org/thread/gnz1kz2rj3rb8rh8qz7l0mv8lvzq254w

On Mon, May 30, 2022 at 11:31 AM Jonathan Keane <jke...@gmail.com>
wrote:

Thanks for working on this. I've heard people asking about something
like this from a number of different fronts on top of the obvious use
case in geoarrow | other geospatial libraries. I think a minimal
piece
of Arrow that other packages could depend on without needing to bring
in all of arrow would be super valuable in building the bridges we
want across other systems.

Do you have any (design) documentation that describes the scope of
what you're thinking? I know there have been others floating around
[1] [2] that were in a similar spirit.

A few more questions I hope will spark more conversation: How do the
header files you linked in [3] overlap with these other efforts? Are
those headers something we could|should "just" PR into apache/arrow
and write up how to use them? If not what is the work to make them so
that they could be (the answer of course could be design something
else entirely and PR that!)?

[1] https://github.com/paleolimbot/narrow
[2] https://paleolimbot.github.io/narrow/articles/why-narrow.html
[3]
https://github.com/paleolimbot/geoarrow-cpp/tree/main/src/geoarrow/
internal/arrow-hpp

-Jon

-Jon


On Wed, May 25, 2022 at 9:29 AM Dewey Dunnington <
de...@voltrondata.com>
wrote:

I'm writing to gauge interest in a set of helpers in C and/or C++
for
reading/exporting Arrow C Data interface structures. My use-case is
building Arrow geospatial support in R [1], and while the set of
helpers
I've been using [2] has served the purpose of me writing about the
opportunities for Arrow + geospatial [3], I would like to rewrite
the
prototype based on something developed by/with the Arrow community.

Does a set of C/C++ helpers for Arrow C Data interface structures
already
exist? *Should* it exist?

If it doesn't, what should the name/scope of that library be? The
names
'nanoarrow', 'narrow', 'sparrow', and 'arrow-hpp' have all
surfaced in
my
limited discussion of this so far. For the purpose of starting the
discussion, I'll posit that the library should include helpers to
allocate/destroy C Data interface structures, a schema metadata
encoder/decoder, validation of a schema/array pair, and something
like
the
ArrayBuilder C++ class.

[1]
https://lists.apache.org/thread/yb7p9wpg3k128njskhwj9j788opb67g7
[2]

https://github.com/paleolimbot/geoarrow-cpp/tree/main/src/geoarrow/
internal/arrow-hpp
[3]
https://docs.google.com/document/d/
1A6e3XCerjhXVFHBDaoAlBBNFb2HG4RB9SVRpuBru7E4/edit?usp=sharing





Reply via email to