coolderli commented on issue #5226:
URL: https://github.com/apache/gravitino/issues/5226#issuecomment-2441473770
It is difficult to define the relationship between fileset, dataset, and
model. If we introduce a new dataset catalog and model catalog, it may be a bit
confusing for users. In addition, datasets or models may only be applicable to
Python APIs in machine learning scenarios.
I am more inclined to provide a dataset and model API based on the fileset.
We can record more metadata information in fileset, such as schema. And when we
use the dataset api, we can use nore metadata information.
```python
# using gvfs api to read the fileset
gvfs.open('gvfs://xxx')
```
``` python
# using dataset api to read the fileset
dataset = datasets.IterableDataset.("catalog.schema.fileset", version='1')
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]