rdblue commented on pull request #1531: URL: https://github.com/apache/iceberg/pull/1531#issuecomment-702936438
Thanks for working on this, @mickjermsurawong-stripe! I like dynamically loading a location provider, and I like that the changes required are pretty small. That looks good to me. What I'd like to change is how this is tested. There are two main issues with it. First, the tests are in an unrelated suite for Spark that is mainly testing that Spark works with a variety of different schemas. It doesn't seem appropriate to put the tests for injecting a location provider in a different module and into this suite. Second, the `TestLocationProvider` returns real paths and the tests work by writing data files and validating that files are underneath those paths. That does test that it works, but I think this makes validation much harder. This approach requires a lot of test code and runs a lot of unrelated cases through to validate the location provider. I would prefer a more targeted test in a new suited in core, `TestLocationProvider`. That would have a case that injects a `TestLocationProvider` that keeps track of calls in static variables, so you'd be able to inject the provider, get it from `table.operations().locationProvider()`, call some methods, and validate that `TestLocationProvider` received the calls. We also need negative tests for when the provider can't be found. I'd like to have a good error message like "Cannot find location provider class: %s". ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
