Thank you. Makes sense to me. Yes, as part of this effort we are going to need contract tests.
On Fri, Mar 17, 2023 at 3:52 AM Steve Loughran <ste...@cloudera.com.invalid> wrote: > 1. I think a new interface would be good as FileContext could do the > same thing > 2. using PathCapabilities probes should still be mandatory as for > FileContext it would depend on the back end > 3. Whoever does this gets to specify what the API does and write the > contract tests. Saying "just to do what HDFS does" isn't enough as it's > not > always clear the HDFS team no how much of that behaviour is intentional > (rename, anyone?). > > > For any new API (a better rename, a better delete,...) I would normally > insist on making it cloud friendly, with an extensible builder API and an > emphasis on asynchronous IO. However this is existing code and does target > HDFS and Ozone -pulling the existing APIs up into a new interface seems the > right thing to do here. > > I have a WiP project to do a shim library to offer new FS APIs two older > Hadoop releases by way of reflection, so that we can get new APIs taken up > across projects where we cannot choreograph version updates across the > entire stack. (hello parquet, spark,...). My goal is to actually make this > a Hadoop managed project, with its own release schedule. You could add an > equivalent of the new interface in here, which would then use reflection > behind-the-scenes to invoke the underlying HDFS methods when the FS client > has them. > > https://github.com/steveloughran/fs-api-shim > > I've just added vector IO API there; the next step is to copy over a lot of > the contract tests from hadoop common and apply them through the shim -to > hadoop 3.2, 3.3.0-3.3.5. That testing against many backends is actually as > tricky as the reflection itself. However without this library it is going > to take a long long time for the open source applications to pick up the > higher performance/Cloud ready Apis. Yes, those of us who can build the > entire stack can do it, but that gradually adds more divergence from the > open source libraries, reduces the test coverage overall and only increases > maintenance costs over time. > > steve > > On Thu, 16 Mar 2023 at 20:56, Wei-Chiu Chuang <weic...@apache.org> wrote: > > > Hi, > > > > Stephen and I are working on a project to make HBase to run on Ozone. > > > > HBase, born out of the Hadoop project, depends on a number of HDFS > specific > > APIs, including recoverLease() and isInSafeMode(). The HBase community > [1] > > strongly voiced that they don't want the project to have direct > dependency > > on additional FS implementations due to dependency and vulnerability > > management concerns. > > > > To make this project successful, we're exploring options, to push up > these > > APIs to the FileSystem abstraction. Eventually, it would make HBase FS > > implementation agnostic, and perhaps enable HBase to support other > storage > > systems in the future. > > > > We'd use the PathCapabilities API to probe if the underlying FS > > implementation supports these APIs, and would then invoke the > corresponding > > FileSystem APIs. This is straightforward but the FileSystem would become > > bloated. > > > > Another option is to create a "RecoverableFileSystem" interface, and have > > both DistributedFileSystem (HDFS) and RootedOzoneFileSystem (Ozone). This > > way the impact to the Hadoop project and the FileSystem abstraction is > even > > smaller. > > > > Thoughts? > > > > [1] https://lists.apache.org/thread/tcrp8vxxs3z12y36mpzx35txhpp7tvxv > > >