pitrou commented on pull request #10995: URL: https://github.com/apache/arrow/pull/10995#issuecomment-908325984
Some high-level comments: 1) since C has no namespacing, all names should be prefix by "Arrow" (for regular names) or "ARROW_" (for preprocessor macros) 2) the proposed API is taking the C data interface as an inspiration. However, unless the intent is to allow different producers to provide the dataset API, this could be a more classical (idiomatic) C API. 3) the dataset can be represented using the (experimental) C stream interface: https://arrow.apache.org/docs/format/CStreamInterface.html In the end, the C API might look like this: ```c #include "arrow/c/abi.h" struct ArrowDatasetFactory; enum ArrowDatasetFormat { ARROW_DATASET_PARQUET, ARROW_DATASET_CSV, ARROW_DATASET_IPC }; int ArrowDatasetFactoryFromUri(const char* uri, struct ArrowDatasetFormat format, struct ArrowDatasetFactory** out); int ArrowDatasetFactoryInspect(struct ArrowDatasetFactory* factory, int num_fragments_to_inspect, struct ArrowSchema* out); int ArrowDatasetFactoryCreateDataset(struct ArrowDatasetFactory* factory, struct ArrowSchema* optional_schema, struct ArrowArrayStream* out); const char* ArrowDatasetFactoryGetLastError(); void ArrowDatasetFactoryDestroy(struct ArrowDatasetFactory*); ``` cc @bkietz for advice -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
