Kimahriman opened a new pull request, #735: URL: https://github.com/apache/incubator-sedona/pull/735
## Did you read the Contributor Guide? - Yes, I have read [Contributor Rules](https://sedona.apache.org/community/rule/) and [Contributor Development Guide](https://sedona.apache.org/community/develop/) ## Is this PR related to a JIRA ticket? - Yes, the URL of the assoicated JIRA ticket is https://issues.apache.org/jira/browse/SEDONA-212. The PR name follows the format `[SEDONA-XXX] my subject`. ## What changes were proposed in this PR? Using the problem of shading to rethink how dependencies are managed. Definitely trying to start a discussion and this is a WIP, finally got the tests to pass. The general way things are setup now is "everything is a provided dependency, except for python-adapter where everything is shaded". This presents a few problems mentioned in the ticket: - It's very difficult to deal with dependency conflicts if one of the conflicts is in a shaded jar. You can't simply exclude the transitive dependency - It's very difficult to actually use the package. What I assume most people do is default to just using the python-adapter module so they get all the things (assuming they don't have problems with the shading). This is very awkward, especially if you aren't using Python. For example, we purely use the SQL functions in the `sql` module, but without knowing what sub-dependencies I need, I just have to default to the python-adapter module so I have the dependencies I need. - The python-adapter module has all the dependencies shaded, _plus_ they are still compile scope dependencies in the result pom, resulting in double including all the dependencies when using Java based build tools or the spark `--packages` option. This is a bug due to the `resolved-pom-maven-plugin` and the `maven-shade-plugin` not playing nice together, so the `dependency-reduced-pom` doesn't end up as the actual pom for the module. Because of all that, I'm proposing a few changes in this PR: - Anything that should be a compile dependency is scoped as such, except for geotools for the licensing reasons. - Dependencies are defined in a `dependencyManagement` section in the parent pom instead of directly defining dependencies for all modules that might only be relevant to one. Each module includes just the relevant dependencies it needs. - Scala plugins are all opt-in to make it clear/sure that the common module does not have any odd Scala dependencies even for testing. - python-adapter still includes shading, but in a separate target used for the Python tests. I did have to work through various exclusions to get the tests to work. I was able to remove the defining the jackson version for all testing purposes. Questions to answer would be: - Is there any reason to actually publish a shaded module? Anything Spark related can either: - Java/Scala: use their Java build tool of choice to include the right dependencies in their subsequent jar - Python: use the `--packages` feature to automatically pull all required transitive dependencies - I don't know anything about Flink and if there is a use case for that needing a shaded module - What to do with the new edu.ucar module? It looks like it's in its own private repo, so for those behind a proxy that don't have access to every repo on the internet, this could cause dependency resolution to fail. ## How was this patch tested? Existing tests. ## Did this PR include necessary documentation updates? - There may have to be documentation updates if this route is accepted, would need to look at what the current examples do. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
