danny0405 commented on PR #5712: URL: https://github.com/apache/hudi/pull/5712#issuecomment-1153596059
> > Can you help me understand why we need this. for any new storage scheme, typically we ask the user to test out hudi end to end once and then add it explicitly to supported storage schemes. But w/ this if we keep it loose, we never know if a particular storage scheme works or not. > > Thanks for your review. There are two reasons to keep schemes loose: > > 1. Name conflict. `hdfs://`, `file://` and `s3://` are famous schemes, no other filesystem won't reuse names of these filesystems. But for a scheme of a not so that famous filesystem, name can be conflict. For example, vendor a uses `xfs://` for a new filesystem, and it is append-able; vendor b uses `xfs://` for a new filesystem, and it is not append-able. > 2. Let user specify file system properties. For example, append-able is `true` for hdfs which doesn't support erasure coding; but for some vendor-specific hdfs which support erasure coding, append-able is `false`. For now, there is only one property on filesystem(append-able), but there may be a few properties on filesystem(e.g. truncate-able) in future. > > In my opinion, let users decide to use which file systems with hudi is a good idea. Hudi should provide end-to-end tests. And it is users's job to run end-to-end tests when integrating new file system with hudi. I have similar opinion with @nsivabalan , even if in corner cases, you have two different filesystem with the same schema, you can change the `append-able` property in your local env. And if you want to switch these properties more flexibly, we may need to find out a way to make these properties configurable, that is a way i can think of. Another point i don't like user defined fs is that, this make the user to use a alias for fs schema conflicts, which is not a standard way, it's weird that the schema is `xfs://` and you give it a alias `yfs://` of something, the `yfs://` is hacky here because it does not exist for real world. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
