Howdy, https://github.com/apache/maven-resolver/pull/176
So here is some implementation "demo" (that could be made into extension point), as explained in Draft PR description. BUT, also as written in PR, am getting a feeling that doing this is "dangerous", and a simple callback with whole collected graph would be better.... WDYT? Tamas On Mon, May 2, 2022 at 4:18 PM Tamás Cservenák <ta...@cservenak.net> wrote: > Howdy, > > just a few short answers: > - 1st: Personally, from a Resolver perspective, I'd just provide an API > (basically the author extending resolver should implement) and make it > simple to "click in" (Sisu component discovery). > - 2nd: resolver IMHO should not provide any out of the box component > implementation at all > > So 1st would provide a "stable" extension point for users who would like > to "integrate" with resolver at this point (like you did), but it could > become possible using simply this new API, instead the hoops and loops your > code was forced to do (as resolver is quite "closed" in this respect). > > As for 2nd point, while I do like your idea of "decorating" local > repository, I'd try a bit different route: I'd integrate this > https://github.com/lambdazen/bitsy that makes possible to use Apache > Tinkerpop's Gremlin queries to ask about the built graph for example... > > And one big remark: the collector is the "hottest point" in resolver (heap > and cpu wise), so ANY "new API" implementation should be aware, that each > "lost" millisecond directly affects resolver collection speed, but I think > for "research kind" of stuff, of just "recording the process result" should > fit in just fine. I don't see this as a "standard" feature of Maven, but > who knows? :) > > Just my 5 cents... > > HTH > Tamas > > On Mon, May 2, 2022 at 4:09 PM Grzegorz Grzybek <gr.grzy...@gmail.com> > wrote: > >> Thank you Tamás for checking my experiment >> >> I'm just finishing my work before tomorrow's national holiday, but will >> read your information more carefully soon. >> >> Whether it's DFS or BFS, as long as there's tracking from initial to >> ultimate dependency, it's enough. DFS sounds more "natural" here though. I >> didn't check the CollectResult class yet - is it created per dependency or >> for the entire project? >> >> And yes - I didn't check multithreading, as in normal scenario (just `mvn >> clean install`) I didn't observe concurrency issues accessing the stack. >> Mind that I know a bit about maven "components", but there are definitely >> few missing things in my understanding. >> >> Checking your output, I see there are two aspects of this potential >> enhancement to the resolver: >> - 1st - how to effectively collect the "reverse dependency tree" in >> context of DFS/BFS/multithreading >> - 2nd - how to write the information >> >> 2nd aspect could include: >> - whether there should be ".tracking" for each GAV directory in local >> repo >> (tracking for the purpose of entire local repository) >> - maybe there should be configurable output location for single report of >> a build? (tracking for the purpose of single project) >> - which format to use (human consumable or machine readable?) >> >> For now I've used resolver 1.6.3 from Maven 3.8.5, but I'll look at `main` >> branch too. >> >> kind regards >> Grzegorz Grzybek >> >> >> pon., 2 maj 2022 o 15:57 Tamás Cservenák <ta...@cservenak.net> >> napisał(a): >> >> > What I missed to mention: in my case the trees in the gist are about >> > "resolving maven-core 3.5.8", but I guess you figured it out from the >> > tree.... >> > >> > T >> > >> > On Mon, May 2, 2022 at 3:55 PM Tamás Cservenák <ta...@cservenak.net> >> > wrote: >> > >> > > Howdy, >> > > >> > > I did some experiment, that (partially re-using your code to dump the >> rev >> > > tree) produces this output: >> > > https://gist.github.com/cstamas/598a3266f943984442c00df30520294f >> > > >> > > (note: 1.8.0 resolver has two collector implementations: original >> > > Depth-First and new Breadth-First called DF and BF respectively) >> > > >> > > The code is not pushed yet anywhere, but I plan to make an API for >> this, >> > > and as you can see, it works >> > > for both implementations of collectors. Also, I hook ONLY into >> collector, >> > > as that's the place where the graph >> > > is being built, but this is logically equivalent to your "More >> > interesting >> > > ... 2nd case". >> > > >> > > Will ping once again when I have the changes.... >> > > >> > > Thanks >> > > Tamas >> > > >> > > On Thu, Apr 28, 2022 at 9:01 PM Tamás Cservenák <ta...@cservenak.net> >> > > wrote: >> > > >> > >> Howdy, >> > >> >> > >> This is very cool, I was actually tinkering on very similar issues in >> > >> resolver coming from totally different angles. >> > >> >> > >> And yes, the resolver collector is not quite "extension" friendly, >> but >> > we >> > >> will make it right. >> > >> Just FYI, that in the latest resolver (1.8.0) there are actually two >> > >> implementations: depth-first (original) and depth-first. >> > >> >> > >> By looking at your code: collection is most critical regarding >> > >> performance and memory in the resolver, so "hooking" into it (like >> > sending >> > >> events per each step) might not be the best, but still, what kind of >> > >> extension points would you envision in the collector? >> > >> >> > >> For example, to achieve what you want, it would be completely enough >> to >> > >> receive the final CollectResult (the full graph), no? >> > >> As -- from a resolver perspective -- that would be simplest, >> especially >> > >> that now we have two collector implementations... >> > >> >> > >> Also, in case of multi threading, your shared stack would not cut, >> would >> > >> it? >> > >> >> > >> I personally was also looking into these, especially after some of >> the >> > >> latest additions to resolver in 1.8.0 and current master.... >> > >> >> > >> >> > >> Thanks >> > >> T >> > >> >> > >> >> > >> On Thu, Apr 28, 2022 at 12:45 PM Grzegorz Grzybek < >> gr.grzy...@gmail.com >> > > >> > >> wrote: >> > >> >> > >>> Hello >> > >>> >> > >>> TL;DR: https://github.com/grgrzybek/tracking-maven-extension >> > >>> >> > >>> I'd like to share some proof of concept I made. It all started with >> a >> > >>> question "why I'm getting log4j:log4j:1.2.12" in my local Maven >> > >>> repository >> > >>> when building trivial project with fresh local repo? >> > >>> >> > >>> I knew it's possible to `grep -r --include=*.pom 1.2.12` the poms >> that >> > >>> declare old log4j, but I needed something better. >> > >>> >> > >>> In short words - I managed to persist the information available in >> > >>> >> > >>> >> > >> org.eclipse.aether.internal.impl.collect.DefaultDependencyCollector.Args#nodes >> > >>> stack. >> > >>> I wrote a Maven extension that can be put into $MAVEN_HOME/lib/ext >> or >> > >>> used >> > >>> with "-Dmaven.ext.class.path" which does two things: >> > >>> >> > >>> 1. adds org.eclipse.aether.RepositoryListener component that >> writes >> > >>> some >> > >>> information when a dependency is FIRST downloaded from remote >> > >>> repository >> > >>> 2. adds org.eclipse.aether.impl.DependencyCollector component >> > >>> (extension >> > >>> of >> > >>> org.eclipse.aether.internal.impl.collect.DefaultDependencyCollector) >> > >>> that writes some information when a dependency is resolved >> against >> > >>> local >> > >>> repository when it's already there (where no download is needed) >> > >>> >> > >>> In the first case, I write something like this: >> > >>> >> > >>> ~~~ >> > >>> Downloaded artifact log4j:log4j:pom::1.2.12 (repository: central ( >> > >>> https://repo.maven.apache.org/maven2, default, releases)) >> > >>> -> commons-logging:commons-logging:jar:1.1 (compile) (context: >> > plugin) >> > >>> -> commons-digester:commons-digester:jar:1.8 (compile) >> (context: >> > >>> plugin) >> > >>> -> org.apache.velocity:velocity-tools:jar:2.0 (compile) >> > (context: >> > >>> plugin) >> > >>> -> org.apache.maven.doxia:doxia-site-renderer:jar:1.11.1 >> > >>> (compile) >> > >>> (context: plugin) >> > >>> -> org.apache.maven.plugins:maven-site-plugin:jar:3.11.0 >> () >> > >>> (context: plugin) >> > >>> Reading descriptor for artifact log4j:log4j:jar::1.2.12 (context: >> > >>> plugin) >> > >>> (scope: ?) (repository: central ( >> https://repo.maven.apache.org/maven2, >> > >>> default, releases)) >> > >>> Transitive dependencies collection for >> > >>> org.apache.maven.plugins:maven-site-plugin:jar:3.11.0 () >> > >>> Resolution of plugin >> > >>> org.apache.maven.plugins:maven-site-plugin:3.11.0 >> > (org.apache:apache:25) >> > >>> ~~~ >> > >>> Downloaded artifact log4j:log4j:jar::1.2.12 (repository: central ( >> > >>> https://repo.maven.apache.org/maven2, default, releases)) >> > >>> Resolution of plugin com.mycila:license-maven-plugin:3.0 >> > >>> (org.apache.camel:camel-buildtools:3.17.0-SNAPSHOT) >> > >>> >> > >>> I simply write some information from available >> > >>> org.eclipse.aether.RepositoryEvent and event's >> > >>> org.eclipse.aether.RequestTrace. >> > >>> >> > >>> More interesting information is written in 2nd case. Because I >> wanted >> > to >> > >>> track ALL attempts to resolve log4j:log4j:1.2.12 (and any other >> > >>> dependency), I needed some structure. And I decided this: >> > >>> >> > >>> - every dependency directory (where e.g., _remote.repositories is >> > >>> written along with the jar/pom/sha1/md5/...) gets ".tracking" >> > >>> directory >> > >>> - in ".tracking" directory I write files with names of this >> pattern: >> > >>> "groupId_artifactId_type_classifier_version.dep", e.g., >> > >>> org.apache.maven.plugins_maven-dependency-plugin_jar_3.1.2.dep >> > >>> - each such file contains a _reverse dependency tree_ that shows >> my >> > >>> why >> > >>> given dependency was resolved. >> > >>> >> > >>> For example, in >> > >>> >> > >>> >> > >> ~/.m2/repository/log4j/log4j/1.2.12/.tracking/org.apache.maven.plugins_maven-dependency-plugin_jar_3.1.2.dep >> > >>> (the path itself already contains information that >> > >>> org.apache.maven.plugins:maven-dependency-plugin:3.1.2 depends >> > (directly >> > >>> or >> > >>> indirectly) in log4j:logj4:1.2.12. >> > >>> The content of this file is: >> > >>> >> > >>> log4j:log4j:pom:1.2.12 >> > >>> -> commons-logging:commons-logging:jar:1.1 (compile) (context: >> plugin) >> > >>> -> commons-digester:commons-digester:jar:1.8 (compile) (context: >> > >>> plugin) >> > >>> -> org.apache.velocity:velocity-tools:jar:2.0 (compile) >> (context: >> > >>> plugin) >> > >>> -> org.apache.maven.doxia:doxia-site-renderer:jar:1.7.4 >> > (compile) >> > >>> (context: plugin) >> > >>> -> >> org.apache.maven.reporting:maven-reporting-impl:jar:3.0.0 >> > >>> (compile) (context: plugin) >> > >>> -> >> > org.apache.maven.plugins:maven-dependency-plugin:jar:3.1.2 >> > >>> () >> > >>> (context: plugin) >> > >>> >> > >>> It's kind of obvious - dependency-plugin through >> maven-reporint-impl, >> > >>> through doxia, velocity, commons-digester and commons-logging >> "depends" >> > >>> on >> > >>> malicious log4j:1.2.12 library every security scanner screams about. >> > >>> >> > >>> Since I wrote this extension, I keep it in my @MAVEN_HOME/lib/ext >> and >> > >>> build >> > >>> everything in my work. Now I know why my >> > >>> ~/.m2/repository/org/codehaus/plexus/plexus-utils/ directory >> contains >> > 57 >> > >>> different versions of plexus-utils for example. for example why >> 1.0.4 >> > >>> from >> > >>> 2005? >> > >>> >> > >>> org.codehaus.plexus:plexus-utils:pom:1.0.4 >> > >>> -> >> > org.codehaus.plexus:plexus-container-default:jar:1.0-alpha-9-stable-1 >> > >>> (compile) (context: plugin) >> > >>> -> org.codehaus.plexus:plexus-velocity:jar:1.2 (compile) >> (context: >> > >>> plugin) >> > >>> -> org.apache.maven.doxia:doxia-site-renderer:jar:1.11.1 >> (compile) >> > >>> (context: plugin) >> > >>> -> org.apache.maven.plugins:maven-javadoc-plugin:jar:3.3.2 () >> > >>> (context: plugin) >> > >>> >> > >>> Why Guava 10.0.1? >> > >>> >> > >>> com.google.guava:guava:pom:10.0.1 >> > >>> -> org.eclipse.sisu:org.eclipse.sisu.plexus:jar:0.0.0.M5 (compile) >> > >>> (context: plugin) >> > >>> -> org.apache.maven:maven-plugin-api:jar:3.1.1 (compile) >> (context: >> > >>> plugin) >> > >>> -> org.apache.maven:maven-core:jar:3.1.1 (compile) (context: >> > plugin) >> > >>> -> >> > org.apache.maven.shared:maven-common-artifact-filters:jar:3.2.0 >> > >>> (runtime) (context: plugin) >> > >>> -> >> > org.springframework.boot:spring-boot-maven-plugin:jar:2.5.12 >> > >>> () >> > >>> (context: plugin) >> > >>> >> > >>> yes - Spring Boot 2.5.12... >> > >>> >> > >>> Why Log4j 2.10.0? >> > >>> >> > >>> org.apache.logging.log4j:log4j-api:pom:2.10.0 >> > >>> -> org.apache.logging.log4j:log4j-to-slf4j:jar:2.10.0 (compile) >> > >>> (context: >> > >>> project) >> > >>> -> >> > >>> >> org.springframework.boot:spring-boot-starter-logging:jar:2.0.5.RELEASE >> > >>> (compile) (context: project) >> > >>> -> >> org.springframework.boot:spring-boot-starter:jar:2.0.5.RELEASE >> > >>> (compile) (context: project) >> > >>> -> >> > >>> org.springframework.boot:spring-boot-starter-web:jar:2.0.5.RELEASE >> > >>> (compile) (context: project) >> > >>> -> org.keycloak:keycloak-spring-boot-2-adapter:jar:17.0.1 >> > >>> (context: project) >> > >>> >> > >>> (see - this time the context is "project", not "plugin"). >> > >>> >> > >>> And so on and so on. >> > >>> >> > >>> What is my motivation with this email? I don't know yet - ideally >> I'd >> > >>> like >> > >>> to have this ".tracking" information created together with >> > >>> "_remote.repositories" and "*.lastUpdated" metadata by Maven >> Resolver. >> > It >> > >>> could be optional of course (the overhead is really minimal - 1 more >> > >>> minute >> > >>> when building Camel 3 - 1 hour instead of 59 minutes). >> > >>> >> > >>> The only problem I had is that I had to fork/shade >> > >>> org.eclipse.aether.internal.impl.collect.DefaultDependencyCollector >> > class >> > >>> because I had to manipulate >> > >>> >> > >>> >> > >> org.eclipse.aether.internal.impl.collect.DefaultDependencyCollector.Args#nodes >> > >>> stack around the call to >> > >>> >> > >>> >> > >> org.jboss.fuse.mvnplugins.tracker.TrackingDependencyCollector#processDependency(). >> > >>> Besides this, normal plexus/sisu components are used. >> > >>> >> > >>> The repository is >> > https://github.com/grgrzybek/tracking-maven-extension >> > >>> and >> > >>> I'd be happy to see some comments about this ;) >> > >>> >> > >>> kind regards >> > >>> Grzegorz Grzybek >> > >>> >> > >> >> > >> >