JonasJ-ap commented on PR #6449: URL: https://github.com/apache/iceberg/pull/6449#issuecomment-1374232612
> I read the demo one more time, thanks @JonasJ-ap this is super helpful in understanding the whole experience. I think one way we can go with this PR is that, because we are using Delta standalone, instead of following the same pattern for having a base class action and then a Spark extension, we can directly make Delta to Iceberg conversion work just within the `iceberg-delta-lake` module. Unit test can be done end to end to test the conversion capability. > > The code in Spark module can be completely removed, as user can simply get Hadoop configuration from Spark session and invoke the method in the `iceberg-delta-lake` module. > > By doing this, it also completely removes the concern of adding Delta dependency to Iceberg Spark distribution. > > Trino can be intergrated in the same way to offer the conversion logic. @findepi > > There could be improvements made in the future to parallelize the retrieval of delta log in different engines like Spark and Trino, but those could be added later as extension points in the `iceberg-delta-lake` module. > > What do you think? @JonasJ-ap > I was thinking about this last night, feels like the current blocker is unit testing with Spark, but actually if we just use `iceberg-spark` as a test dependency, we can move all the tests currently in the Spark module to `iceberg-delta-lake` and satisfy the goal of unit-testing the implementation without the need to find another way to write Delta lake tables outside Spark. Thank you very much for your suggestions. I agree that it is unnecessary to create a spark action to just provide hadoop configuration. We may have some trouble if we include `iceberg-spark` for the unit test as the Java CI does not include iceberg-spark in the build. Instead, we can make spark-related tests as integrationTest to avoid this issue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
