Moving conversation to dev@ which is more appropriate place to discuss.

On Tuesday, November 1, 2022, Chang She <ch...@eto.ai> wrote:

> Hi there,
>
> The pyarrow dataset API is marked experimental so I'm curious if y'all
> have made any decisions on it for upcoming releases. Specifically, any
> thoughts on making the Scanner and things like FileSystemDataset part of
> the "public API" (i.e., putting declarations in the _dataset.pxd)? It would
> make it a lot easier for new data formats to be built on top of the Arrow
> platform. e.g., Lance supports efficient partial reads from s3 for
> limit/offset (via additional ScanOptions), but currently it's difficult to
> expose the scanner to the rest of Arrow. Instead we subclass Dataset and
> return a custom scanner we created. And our Dataset subclass *should* be a
> FileSystemDataset subclass, but FileSystemDataset is not "public API" etc.
> Happy to discuss additional details, for reference:
> github.com/eto-ai/lance
>
> Thanks!
>
> Chang
>

Reply via email to