pitrou commented on pull request #10995:
URL: https://github.com/apache/arrow/pull/10995#issuecomment-908325984


   Some high-level comments:
   1) since C has no namespacing, all names should be prefix by "Arrow" (for 
regular names) or "ARROW_" (for preprocessor macros)
   2) the proposed API is taking the C data interface as an inspiration. 
However, unless the intent is to allow different producers to provide the 
dataset API, this could be a more classical (idiomatic) C API.
   3) the dataset can be represented using the (experimental) C stream 
interface: https://arrow.apache.org/docs/format/CStreamInterface.html
   
   In the end, the C API might look like this:
   ```c
   
   #include "arrow/c/abi.h"
   
   struct ArrowDatasetFactory;
   
   enum ArrowDatasetFormat {
       ARROW_DATASET_PARQUET, ARROW_DATASET_CSV, ARROW_DATASET_IPC
   };
   
   int ArrowDatasetFactoryFromUri(const char* uri,
                                  struct ArrowDatasetFormat format,
                                  struct ArrowDatasetFactory** out);
   
   int ArrowDatasetFactoryInspect(struct ArrowDatasetFactory* factory,
                                  int num_fragments_to_inspect,
                                  struct ArrowSchema* out);
   
   int ArrowDatasetFactoryCreateDataset(struct ArrowDatasetFactory* factory,
                                        struct ArrowSchema* optional_schema,
                                        struct ArrowArrayStream* out);
   
   const char* ArrowDatasetFactoryGetLastError();
   
   void ArrowDatasetFactoryDestroy(struct ArrowDatasetFactory*);
   
   ```
   
   cc @bkietz for advice


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to