Hello! Thanks for your comments and PR - I needed to switch to different tasks, but soon (next week?) I'm going to spend more time on it. I yet have to get a feeling of the graph/stack that could be passed around. And check these DF/BF dependency collectors (as I didn't see them in resolver 1.6.3). I'll keep the https://issues.apache.org/jira/browse/MRESOLVER-248 tab open till I check it ;)
kind regards Grzegorz Grzybek śr., 11 maj 2022 o 18:40 Tamás Cservenák <ta...@cservenak.net> napisał(a): > Howdy, > > https://github.com/apache/maven-resolver/pull/176 > > So here is some implementation "demo" (that could be made into extension > point), as explained in Draft PR description. > BUT, also as written in PR, am getting a feeling that doing this is > "dangerous", and a simple callback with whole collected graph would be > better.... > > > WDYT? > > Tamas > > On Mon, May 2, 2022 at 4:18 PM Tamás Cservenák <ta...@cservenak.net> > wrote: > > > Howdy, > > > > just a few short answers: > > - 1st: Personally, from a Resolver perspective, I'd just provide an API > > (basically the author extending resolver should implement) and make it > > simple to "click in" (Sisu component discovery). > > - 2nd: resolver IMHO should not provide any out of the box component > > implementation at all > > > > So 1st would provide a "stable" extension point for users who would like > > to "integrate" with resolver at this point (like you did), but it could > > become possible using simply this new API, instead the hoops and loops > your > > code was forced to do (as resolver is quite "closed" in this respect). > > > > As for 2nd point, while I do like your idea of "decorating" local > > repository, I'd try a bit different route: I'd integrate this > > https://github.com/lambdazen/bitsy that makes possible to use Apache > > Tinkerpop's Gremlin queries to ask about the built graph for example... > > > > And one big remark: the collector is the "hottest point" in resolver > (heap > > and cpu wise), so ANY "new API" implementation should be aware, that each > > "lost" millisecond directly affects resolver collection speed, but I > think > > for "research kind" of stuff, of just "recording the process result" > should > > fit in just fine. I don't see this as a "standard" feature of Maven, but > > who knows? :) > > > > Just my 5 cents... > > > > HTH > > Tamas > > > > On Mon, May 2, 2022 at 4:09 PM Grzegorz Grzybek <gr.grzy...@gmail.com> > > wrote: > > > >> Thank you Tamás for checking my experiment > >> > >> I'm just finishing my work before tomorrow's national holiday, but will > >> read your information more carefully soon. > >> > >> Whether it's DFS or BFS, as long as there's tracking from initial to > >> ultimate dependency, it's enough. DFS sounds more "natural" here > though. I > >> didn't check the CollectResult class yet - is it created per dependency > or > >> for the entire project? > >> > >> And yes - I didn't check multithreading, as in normal scenario (just > `mvn > >> clean install`) I didn't observe concurrency issues accessing the stack. > >> Mind that I know a bit about maven "components", but there are > definitely > >> few missing things in my understanding. > >> > >> Checking your output, I see there are two aspects of this potential > >> enhancement to the resolver: > >> - 1st - how to effectively collect the "reverse dependency tree" in > >> context of DFS/BFS/multithreading > >> - 2nd - how to write the information > >> > >> 2nd aspect could include: > >> - whether there should be ".tracking" for each GAV directory in local > >> repo > >> (tracking for the purpose of entire local repository) > >> - maybe there should be configurable output location for single report > of > >> a build? (tracking for the purpose of single project) > >> - which format to use (human consumable or machine readable?) > >> > >> For now I've used resolver 1.6.3 from Maven 3.8.5, but I'll look at > `main` > >> branch too. > >> > >> kind regards > >> Grzegorz Grzybek > >> > >> > >> pon., 2 maj 2022 o 15:57 Tamás Cservenák <ta...@cservenak.net> > >> napisał(a): > >> > >> > What I missed to mention: in my case the trees in the gist are about > >> > "resolving maven-core 3.5.8", but I guess you figured it out from the > >> > tree.... > >> > > >> > T > >> > > >> > On Mon, May 2, 2022 at 3:55 PM Tamás Cservenák <ta...@cservenak.net> > >> > wrote: > >> > > >> > > Howdy, > >> > > > >> > > I did some experiment, that (partially re-using your code to dump > the > >> rev > >> > > tree) produces this output: > >> > > https://gist.github.com/cstamas/598a3266f943984442c00df30520294f > >> > > > >> > > (note: 1.8.0 resolver has two collector implementations: original > >> > > Depth-First and new Breadth-First called DF and BF respectively) > >> > > > >> > > The code is not pushed yet anywhere, but I plan to make an API for > >> this, > >> > > and as you can see, it works > >> > > for both implementations of collectors. Also, I hook ONLY into > >> collector, > >> > > as that's the place where the graph > >> > > is being built, but this is logically equivalent to your "More > >> > interesting > >> > > ... 2nd case". > >> > > > >> > > Will ping once again when I have the changes.... > >> > > > >> > > Thanks > >> > > Tamas > >> > > > >> > > On Thu, Apr 28, 2022 at 9:01 PM Tamás Cservenák < > ta...@cservenak.net> > >> > > wrote: > >> > > > >> > >> Howdy, > >> > >> > >> > >> This is very cool, I was actually tinkering on very similar issues > in > >> > >> resolver coming from totally different angles. > >> > >> > >> > >> And yes, the resolver collector is not quite "extension" friendly, > >> but > >> > we > >> > >> will make it right. > >> > >> Just FYI, that in the latest resolver (1.8.0) there are actually > two > >> > >> implementations: depth-first (original) and depth-first. > >> > >> > >> > >> By looking at your code: collection is most critical regarding > >> > >> performance and memory in the resolver, so "hooking" into it (like > >> > sending > >> > >> events per each step) might not be the best, but still, what kind > of > >> > >> extension points would you envision in the collector? > >> > >> > >> > >> For example, to achieve what you want, it would be completely > enough > >> to > >> > >> receive the final CollectResult (the full graph), no? > >> > >> As -- from a resolver perspective -- that would be simplest, > >> especially > >> > >> that now we have two collector implementations... > >> > >> > >> > >> Also, in case of multi threading, your shared stack would not cut, > >> would > >> > >> it? > >> > >> > >> > >> I personally was also looking into these, especially after some of > >> the > >> > >> latest additions to resolver in 1.8.0 and current master.... > >> > >> > >> > >> > >> > >> Thanks > >> > >> T > >> > >> > >> > >> > >> > >> On Thu, Apr 28, 2022 at 12:45 PM Grzegorz Grzybek < > >> gr.grzy...@gmail.com > >> > > > >> > >> wrote: > >> > >> > >> > >>> Hello > >> > >>> > >> > >>> TL;DR: https://github.com/grgrzybek/tracking-maven-extension > >> > >>> > >> > >>> I'd like to share some proof of concept I made. It all started > with > >> a > >> > >>> question "why I'm getting log4j:log4j:1.2.12" in my local Maven > >> > >>> repository > >> > >>> when building trivial project with fresh local repo? > >> > >>> > >> > >>> I knew it's possible to `grep -r --include=*.pom 1.2.12` the poms > >> that > >> > >>> declare old log4j, but I needed something better. > >> > >>> > >> > >>> In short words - I managed to persist the information available in > >> > >>> > >> > >>> > >> > > >> > org.eclipse.aether.internal.impl.collect.DefaultDependencyCollector.Args#nodes > >> > >>> stack. > >> > >>> I wrote a Maven extension that can be put into $MAVEN_HOME/lib/ext > >> or > >> > >>> used > >> > >>> with "-Dmaven.ext.class.path" which does two things: > >> > >>> > >> > >>> 1. adds org.eclipse.aether.RepositoryListener component that > >> writes > >> > >>> some > >> > >>> information when a dependency is FIRST downloaded from remote > >> > >>> repository > >> > >>> 2. adds org.eclipse.aether.impl.DependencyCollector component > >> > >>> (extension > >> > >>> of > >> > >>> > org.eclipse.aether.internal.impl.collect.DefaultDependencyCollector) > >> > >>> that writes some information when a dependency is resolved > >> against > >> > >>> local > >> > >>> repository when it's already there (where no download is > needed) > >> > >>> > >> > >>> In the first case, I write something like this: > >> > >>> > >> > >>> ~~~ > >> > >>> Downloaded artifact log4j:log4j:pom::1.2.12 (repository: central ( > >> > >>> https://repo.maven.apache.org/maven2, default, releases)) > >> > >>> -> commons-logging:commons-logging:jar:1.1 (compile) (context: > >> > plugin) > >> > >>> -> commons-digester:commons-digester:jar:1.8 (compile) > >> (context: > >> > >>> plugin) > >> > >>> -> org.apache.velocity:velocity-tools:jar:2.0 (compile) > >> > (context: > >> > >>> plugin) > >> > >>> -> org.apache.maven.doxia:doxia-site-renderer:jar:1.11.1 > >> > >>> (compile) > >> > >>> (context: plugin) > >> > >>> -> > org.apache.maven.plugins:maven-site-plugin:jar:3.11.0 > >> () > >> > >>> (context: plugin) > >> > >>> Reading descriptor for artifact log4j:log4j:jar::1.2.12 > (context: > >> > >>> plugin) > >> > >>> (scope: ?) (repository: central ( > >> https://repo.maven.apache.org/maven2, > >> > >>> default, releases)) > >> > >>> Transitive dependencies collection for > >> > >>> org.apache.maven.plugins:maven-site-plugin:jar:3.11.0 () > >> > >>> Resolution of plugin > >> > >>> org.apache.maven.plugins:maven-site-plugin:3.11.0 > >> > (org.apache:apache:25) > >> > >>> ~~~ > >> > >>> Downloaded artifact log4j:log4j:jar::1.2.12 (repository: central ( > >> > >>> https://repo.maven.apache.org/maven2, default, releases)) > >> > >>> Resolution of plugin com.mycila:license-maven-plugin:3.0 > >> > >>> (org.apache.camel:camel-buildtools:3.17.0-SNAPSHOT) > >> > >>> > >> > >>> I simply write some information from available > >> > >>> org.eclipse.aether.RepositoryEvent and event's > >> > >>> org.eclipse.aether.RequestTrace. > >> > >>> > >> > >>> More interesting information is written in 2nd case. Because I > >> wanted > >> > to > >> > >>> track ALL attempts to resolve log4j:log4j:1.2.12 (and any other > >> > >>> dependency), I needed some structure. And I decided this: > >> > >>> > >> > >>> - every dependency directory (where e.g., _remote.repositories > is > >> > >>> written along with the jar/pom/sha1/md5/...) gets ".tracking" > >> > >>> directory > >> > >>> - in ".tracking" directory I write files with names of this > >> pattern: > >> > >>> "groupId_artifactId_type_classifier_version.dep", e.g., > >> > >>> org.apache.maven.plugins_maven-dependency-plugin_jar_3.1.2.dep > >> > >>> - each such file contains a _reverse dependency tree_ that > shows > >> my > >> > >>> why > >> > >>> given dependency was resolved. > >> > >>> > >> > >>> For example, in > >> > >>> > >> > >>> > >> > > >> > ~/.m2/repository/log4j/log4j/1.2.12/.tracking/org.apache.maven.plugins_maven-dependency-plugin_jar_3.1.2.dep > >> > >>> (the path itself already contains information that > >> > >>> org.apache.maven.plugins:maven-dependency-plugin:3.1.2 depends > >> > (directly > >> > >>> or > >> > >>> indirectly) in log4j:logj4:1.2.12. > >> > >>> The content of this file is: > >> > >>> > >> > >>> log4j:log4j:pom:1.2.12 > >> > >>> -> commons-logging:commons-logging:jar:1.1 (compile) (context: > >> plugin) > >> > >>> -> commons-digester:commons-digester:jar:1.8 (compile) > (context: > >> > >>> plugin) > >> > >>> -> org.apache.velocity:velocity-tools:jar:2.0 (compile) > >> (context: > >> > >>> plugin) > >> > >>> -> org.apache.maven.doxia:doxia-site-renderer:jar:1.7.4 > >> > (compile) > >> > >>> (context: plugin) > >> > >>> -> > >> org.apache.maven.reporting:maven-reporting-impl:jar:3.0.0 > >> > >>> (compile) (context: plugin) > >> > >>> -> > >> > org.apache.maven.plugins:maven-dependency-plugin:jar:3.1.2 > >> > >>> () > >> > >>> (context: plugin) > >> > >>> > >> > >>> It's kind of obvious - dependency-plugin through > >> maven-reporint-impl, > >> > >>> through doxia, velocity, commons-digester and commons-logging > >> "depends" > >> > >>> on > >> > >>> malicious log4j:1.2.12 library every security scanner screams > about. > >> > >>> > >> > >>> Since I wrote this extension, I keep it in my @MAVEN_HOME/lib/ext > >> and > >> > >>> build > >> > >>> everything in my work. Now I know why my > >> > >>> ~/.m2/repository/org/codehaus/plexus/plexus-utils/ directory > >> contains > >> > 57 > >> > >>> different versions of plexus-utils for example. for example why > >> 1.0.4 > >> > >>> from > >> > >>> 2005? > >> > >>> > >> > >>> org.codehaus.plexus:plexus-utils:pom:1.0.4 > >> > >>> -> > >> > org.codehaus.plexus:plexus-container-default:jar:1.0-alpha-9-stable-1 > >> > >>> (compile) (context: plugin) > >> > >>> -> org.codehaus.plexus:plexus-velocity:jar:1.2 (compile) > >> (context: > >> > >>> plugin) > >> > >>> -> org.apache.maven.doxia:doxia-site-renderer:jar:1.11.1 > >> (compile) > >> > >>> (context: plugin) > >> > >>> -> org.apache.maven.plugins:maven-javadoc-plugin:jar:3.3.2 > () > >> > >>> (context: plugin) > >> > >>> > >> > >>> Why Guava 10.0.1? > >> > >>> > >> > >>> com.google.guava:guava:pom:10.0.1 > >> > >>> -> org.eclipse.sisu:org.eclipse.sisu.plexus:jar:0.0.0.M5 > (compile) > >> > >>> (context: plugin) > >> > >>> -> org.apache.maven:maven-plugin-api:jar:3.1.1 (compile) > >> (context: > >> > >>> plugin) > >> > >>> -> org.apache.maven:maven-core:jar:3.1.1 (compile) (context: > >> > plugin) > >> > >>> -> > >> > org.apache.maven.shared:maven-common-artifact-filters:jar:3.2.0 > >> > >>> (runtime) (context: plugin) > >> > >>> -> > >> > org.springframework.boot:spring-boot-maven-plugin:jar:2.5.12 > >> > >>> () > >> > >>> (context: plugin) > >> > >>> > >> > >>> yes - Spring Boot 2.5.12... > >> > >>> > >> > >>> Why Log4j 2.10.0? > >> > >>> > >> > >>> org.apache.logging.log4j:log4j-api:pom:2.10.0 > >> > >>> -> org.apache.logging.log4j:log4j-to-slf4j:jar:2.10.0 (compile) > >> > >>> (context: > >> > >>> project) > >> > >>> -> > >> > >>> > >> org.springframework.boot:spring-boot-starter-logging:jar:2.0.5.RELEASE > >> > >>> (compile) (context: project) > >> > >>> -> > >> org.springframework.boot:spring-boot-starter:jar:2.0.5.RELEASE > >> > >>> (compile) (context: project) > >> > >>> -> > >> > >>> org.springframework.boot:spring-boot-starter-web:jar:2.0.5.RELEASE > >> > >>> (compile) (context: project) > >> > >>> -> org.keycloak:keycloak-spring-boot-2-adapter:jar:17.0.1 > >> > >>> (context: project) > >> > >>> > >> > >>> (see - this time the context is "project", not "plugin"). > >> > >>> > >> > >>> And so on and so on. > >> > >>> > >> > >>> What is my motivation with this email? I don't know yet - ideally > >> I'd > >> > >>> like > >> > >>> to have this ".tracking" information created together with > >> > >>> "_remote.repositories" and "*.lastUpdated" metadata by Maven > >> Resolver. > >> > It > >> > >>> could be optional of course (the overhead is really minimal - 1 > more > >> > >>> minute > >> > >>> when building Camel 3 - 1 hour instead of 59 minutes). > >> > >>> > >> > >>> The only problem I had is that I had to fork/shade > >> > >>> > org.eclipse.aether.internal.impl.collect.DefaultDependencyCollector > >> > class > >> > >>> because I had to manipulate > >> > >>> > >> > >>> > >> > > >> > org.eclipse.aether.internal.impl.collect.DefaultDependencyCollector.Args#nodes > >> > >>> stack around the call to > >> > >>> > >> > >>> > >> > > >> > org.jboss.fuse.mvnplugins.tracker.TrackingDependencyCollector#processDependency(). > >> > >>> Besides this, normal plexus/sisu components are used. > >> > >>> > >> > >>> The repository is > >> > https://github.com/grgrzybek/tracking-maven-extension > >> > >>> and > >> > >>> I'd be happy to see some comments about this ;) > >> > >>> > >> > >>> kind regards > >> > >>> Grzegorz Grzybek > >> > >>> > >> > >> > >> > > >> > > >