xloya commented on issue #5226: URL: https://github.com/apache/gravitino/issues/5226#issuecomment-2434155379
> @xloya @FANNG1 What about `G-sequence`, `G-tfrecord`, `G-parquet`? Using > > ``` > select * from `G-sequence`.`gvfs://` > ``` > > to read the data. > > There is something I want to be clear. Why we need `G-parquet` not a `parquet`. The `G-parquet` will bind to a table schema and `parquet` will infer schema. Users use this table schema to write data, so we can ensure data compatibility because we assume that the table schema evolves correctly. Another question is, if we have a table schema, why not write to a table? We need a file format, not a table. For example, in the context of machine learning, we always use a file format, not a table format. Yes, I think it is necessary to distinguish from the default file format implementation supported by Spark. Users can of course continue to use Spark's default implementation, but in our data source, we can provide enhanced capabilities. As for the naming of G-parquet, I think it is a little unclear, because currently we only provide it for fileset, and I am not sure whether it will be used for other resources in the future. If not, we'd better bind it to fileset in naming. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
