iodone commented on issue #5226:
URL: https://github.com/apache/gravitino/issues/5226#issuecomment-2443131812

   > It is difficult to define the relationship between fileset, dataset, and 
model. If we introduce a new dataset catalog and model catalog, it may be a bit 
confusing for users. In addition, datasets or models may only be applicable to 
Python APIs in machine learning scenarios.
   > 
   > I am more inclined to provide a dataset and model API based on the 
fileset. We can record more metadata information in fileset, such as schema. 
And when we use the dataset api, we can use nore metadata information.
   > 
   > ```python
   > # using gvfs api to read the fileset
   > gvfs.open('gvfs://xxx')
   > ```
   > 
   > ```python
   > # using dataset api to read the fileset
   > dataset = datasets.IterableDataset.("catalog.schema.fileset", version='1')
   > ```
   
   Referring to the Python API of the `Lance` dataset, do we consider 
implementing lance's catalog based on fileset? Fileset is an open definition 
without any schema constraints, unless we build a fileset data format ourselves 
to support unstructured data (supporting image/tfrecord).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to