iodone commented on issue #5226:
URL: https://github.com/apache/gravitino/issues/5226#issuecomment-2443131812
> It is difficult to define the relationship between fileset, dataset, and
model. If we introduce a new dataset catalog and model catalog, it may be a bit
confusing for users. In addition, datasets or models may only be applicable to
Python APIs in machine learning scenarios.
>
> I am more inclined to provide a dataset and model API based on the
fileset. We can record more metadata information in fileset, such as schema.
And when we use the dataset api, we can use nore metadata information.
>
> ```python
> # using gvfs api to read the fileset
> gvfs.open('gvfs://xxx')
> ```
>
> ```python
> # using dataset api to read the fileset
> dataset = datasets.IterableDataset.("catalog.schema.fileset", version='1')
> ```
Referring to the Python API of the `Lance` dataset, do we consider
implementing lance's catalog based on fileset? Fileset is an open definition
without any schema constraints, unless we build a fileset data format ourselves
to support unstructured data (supporting image/tfrecord).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]