> Can we name it miniarrow or nanoarrow? I'm happy to call it something else! Probably nanoarrow if I get to pick because of the parallel with nanopb/nanodbc.
On Thu, Jun 16, 2022 at 6:26 AM Antoine Pitrou <anto...@python.org> wrote: > > Can we name it miniarrow or nanoarrow? We don't want to convey the > message that there is a parallel C API for Arrow. > > > Le 15/06/2022 à 05:18, Dewey Dunnington a écrit : > > Hi all, > > > > I drafted a second PR [1] drafting a design for storing parsed > information > > obtained from a struct ArrowSchema (i.e., parsing the format string into > > usable C structures). There are some unsolved problems that could use a > > fresh perspective...all comments welcome! > > > > [1] https://github.com/paleolimbot/arrow-c/pull/5 > > > > On Fri, Jun 10, 2022 at 12:27 PM Dewey Dunnington <de...@voltrondata.com > > > > wrote: > > > >> Hi all, > >> > >> As promised, I converted the design document [1] into an initial PR [2]. > >> Rather than draft the whole header, I started with README + > implementations > >> + testing for error handling and schema allocation (depending on > feedback, > >> next week I will draft another reviewable chunk). > >> > >> Also feel free to suggest another place to put this if one exists (the > >> choice to put it in its own repo was based on informal feedback that > >> perhaps that might be the best way to go). > >> > >> [1] > >> > https://docs.google.com/document/d/11n7ICVZO8exZ-z3GRlI26VLzKPXlYlEz5xjLl1y0ujU/edit?usp=sharing > >> [2] https://github.com/paleolimbot/arrow-c/pull/1/files > >> > >> On Fri, Jun 3, 2022 at 12:41 PM Dewey Dunnington <de...@voltrondata.com > > > >> wrote: > >> > >>> Hi all, > >>> > >>> Based on the points raised above and a few adventures implementing some > >>> of this in related projects, I put together a brief design document > >>> proposing a scope and structure to perhaps solidify a few of these > >>> discussions: > >>> > https://docs.google.com/document/d/11n7ICVZO8exZ-z3GRlI26VLzKPXlYlEz5xjLl1y0ujU/edit?usp=sharing > >>> . > >>> > >>> Any and all should feel free to add, rewrite, or propose a new > >>> structure...I wrote many of the pieces for argument's sake or because > >>> that's how I'd implemented them before. > >>> > >>> Next week I will phrase it as a skeleton header (like the one in the > >>> excellent ADBC design discussions) depending on feedback to keep the > >>> discussion going! > >>> > >>> Cheers, > >>> > >>> -dewey > >>> > >>> On Fri, Jun 3, 2022 at 9:57 AM Hannes Mühleisen <han...@duckdblabs.com > > > >>> wrote: > >>> > >>>> Hello List, > >>>> > >>>> we at DuckDB are happy users of the Arrow C Data Interface and use it > to > >>>> feed SQL queries and also use it to provide query results in Arrow > format > >>>> again. It is particularly appealing to us that the interface is > merely a > >>>> (C) header file that we just ship with our source code [1]. > Internally, > >>>> our > >>>> implementation then constructs DuckDB internal vectors from the Arrow > >>>> format [2] or vice-versa [3]. > >>>> > >>>> As you can see from [2, 3] there is some complexity in getting the > >>>> conversion right, especially for more complex data types like nested > >>>> types > >>>> (list, strings). A lightweight, dependency-free library to help > >>>> constructing those would certainly be appreciated. What would also > help a > >>>> lot is validation code, Arrow structures are very delicate and one > wrong > >>>> pointer can lead to disaster (which is then blamed on us), so a way to > >>>> verify the structures in said lightweight library would be very > helpful. > >>>> > >>>> Best from Amsterdam, and Quack > >>>> > >>>> Hannes > >>>> > >>>> [1] > >>>> > >>>> > https://github.com/duckdb/duckdb/blob/master/src/include/duckdb/common/arrow.hpp > >>>> [2] > >>>> > https://github.com/duckdb/duckdb/blob/master/src/function/table/arrow.cpp > >>>> [3] > >>>> > >>>> > https://github.com/duckdb/duckdb/blob/master/src/common/types/data_chunk.cpp > >>>> > >>>> > >>>> On Fri, Jun 03, 2022 at 15:34:42, Jonathan Keane <jke...@gmail.com> > >>>> wrote: > >>>> > >>>>> cc Hannes Mühleisen from DuckDB Labs > >>>>> > >>>>> -Jon > >>>>> > >>>>> > >>>>> On Tue, May 31, 2022 at 5:03 PM Wes McKinney <wesmck...@gmail.com> > >>>> wrote: > >>>>> > >>>>> I'm also supportive of having a small vendorable C/C++ "Arrow > >>>>> middleware" that provides: > >>>>> > >>>>> * Schemas and types > >>>>> * Columnar data structures and minimal APIs to build them and iterate > >>>> over > >>>>> them > >>>>> * C data interface > >>>>> * Minimal validation (at the level of Validate but not ValidateFull) > >>>>> > >>>>> I don't think it's going to be practical to try to refactor parts of > >>>>> the existing Arrow C++ core to be vendorable since there are many > >>>>> features / requirements (e.g. an extensible buffer and device API) > >>>>> that these C++ classes include that aren't needed in this > >>>>> limited-feature middleware library. > >>>>> > >>>>> This also relates to the "Improving Arrow's database support" project > >>>>> that David Li raised some time ago [1]. If we want to encourage > >>>>> database driver libraries to add new APIs that emit the Arrow C > >>>>> interface, we need to make it easier to generate the C interface > >>>>> without requiring a new library dependency. > >>>>> > >>>>> [1]: > https://lists.apache.org/thread/gnz1kz2rj3rb8rh8qz7l0mv8lvzq254w > >>>>> > >>>>> On Mon, May 30, 2022 at 11:31 AM Jonathan Keane <jke...@gmail.com> > >>>> wrote: > >>>>>> > >>>>>> Thanks for working on this. I've heard people asking about something > >>>>>> like this from a number of different fronts on top of the obvious > use > >>>>>> case in geoarrow | other geospatial libraries. I think a minimal > >>>> piece > >>>>>> of Arrow that other packages could depend on without needing to > bring > >>>>>> in all of arrow would be super valuable in building the bridges we > >>>>>> want across other systems. > >>>>>> > >>>>>> Do you have any (design) documentation that describes the scope of > >>>>>> what you're thinking? I know there have been others floating around > >>>>>> [1] [2] that were in a similar spirit. > >>>>>> > >>>>>> A few more questions I hope will spark more conversation: How do the > >>>>>> header files you linked in [3] overlap with these other efforts? Are > >>>>>> those headers something we could|should "just" PR into apache/arrow > >>>>>> and write up how to use them? If not what is the work to make them > so > >>>>>> that they could be (the answer of course could be design something > >>>>>> else entirely and PR that!)? > >>>>>> > >>>>>> [1] https://github.com/paleolimbot/narrow > >>>>>> [2] https://paleolimbot.github.io/narrow/articles/why-narrow.html > >>>>>> [3] > >>>> https://github.com/paleolimbot/geoarrow-cpp/tree/main/src/geoarrow/ > >>>>> internal/arrow-hpp > >>>>>> > >>>>>> -Jon > >>>>>> > >>>>>> -Jon > >>>>>> > >>>>>> > >>>>>> On Wed, May 25, 2022 at 9:29 AM Dewey Dunnington < > >>>> de...@voltrondata.com> > >>>>> wrote: > >>>>>>> > >>>>>>> I'm writing to gauge interest in a set of helpers in C and/or C++ > >>>> for > >>>>>>> reading/exporting Arrow C Data interface structures. My use-case is > >>>>>>> building Arrow geospatial support in R [1], and while the set of > >>>>> helpers > >>>>>>> I've been using [2] has served the purpose of me writing about the > >>>>>>> opportunities for Arrow + geospatial [3], I would like to rewrite > >>>> the > >>>>>>> prototype based on something developed by/with the Arrow community. > >>>>>>> > >>>>>>> Does a set of C/C++ helpers for Arrow C Data interface structures > >>>>> already > >>>>>>> exist? *Should* it exist? > >>>>>>> > >>>>>>> If it doesn't, what should the name/scope of that library be? The > >>>> names > >>>>>>> 'nanoarrow', 'narrow', 'sparrow', and 'arrow-hpp' have all > >>>> surfaced in > >>>>> my > >>>>>>> limited discussion of this so far. For the purpose of starting the > >>>>>>> discussion, I'll posit that the library should include helpers to > >>>>>>> allocate/destroy C Data interface structures, a schema metadata > >>>>>>> encoder/decoder, validation of a schema/array pair, and something > >>>> like > >>>>> the > >>>>>>> ArrayBuilder C++ class. > >>>>>>> > >>>>>>> [1] > >>>> https://lists.apache.org/thread/yb7p9wpg3k128njskhwj9j788opb67g7 > >>>>>>> [2] > >>>>>>> > >>>> https://github.com/paleolimbot/geoarrow-cpp/tree/main/src/geoarrow/ > >>>>> internal/arrow-hpp > >>>>>>> [3] > >>>>>>> https://docs.google.com/document/d/ > >>>>> 1A6e3XCerjhXVFHBDaoAlBBNFb2HG4RB9SVRpuBru7E4/edit?usp=sharing > >>>>> > >>>>> > >>>> > >>> > > >