Thank you Tamás for checking my experiment

I'm just finishing my work before tomorrow's national holiday, but will
read your information more carefully soon.

Whether it's DFS or BFS, as long as there's tracking from initial to
ultimate dependency, it's enough. DFS sounds more "natural" here though. I
didn't check the CollectResult class yet - is it created per dependency or
for the entire project?

And yes - I didn't check multithreading, as in normal scenario (just `mvn
clean install`) I didn't observe concurrency issues accessing the stack.
Mind that I know a bit about maven "components", but there are definitely
few missing things in my understanding.

Checking your output, I see there are two aspects of this potential
enhancement to the resolver:
 - 1st - how to effectively collect the "reverse dependency tree" in
context of DFS/BFS/multithreading
 - 2nd - how to write the information

2nd aspect could include:
 - whether there should be ".tracking" for each GAV directory in local repo
(tracking for the purpose of entire local repository)
 - maybe there should be configurable output location for single report of
a build? (tracking for the purpose of single project)
 - which format to use (human consumable or machine readable?)

For now I've used resolver 1.6.3 from Maven 3.8.5, but I'll look at `main`
branch too.

kind regards
Grzegorz Grzybek


pon., 2 maj 2022 o 15:57 Tamás Cservenák <ta...@cservenak.net> napisał(a):

> What I missed to mention: in my case the trees in the gist are about
> "resolving maven-core 3.5.8", but I guess you figured it out from the
> tree....
>
> T
>
> On Mon, May 2, 2022 at 3:55 PM Tamás Cservenák <ta...@cservenak.net>
> wrote:
>
> > Howdy,
> >
> > I did some experiment, that (partially re-using your code to dump the rev
> > tree) produces this output:
> > https://gist.github.com/cstamas/598a3266f943984442c00df30520294f
> >
> > (note: 1.8.0 resolver has two collector implementations: original
> > Depth-First and new Breadth-First called DF and BF respectively)
> >
> > The code is not pushed yet anywhere, but I plan to make an API for this,
> > and as you can see, it works
> > for both implementations of collectors. Also, I hook ONLY into collector,
> > as that's the place where the graph
> > is being built, but this is logically equivalent to your "More
> interesting
> > ... 2nd case".
> >
> > Will ping once again when I have the changes....
> >
> > Thanks
> > Tamas
> >
> > On Thu, Apr 28, 2022 at 9:01 PM Tamás Cservenák <ta...@cservenak.net>
> > wrote:
> >
> >> Howdy,
> >>
> >> This is very cool, I was actually tinkering on very similar issues in
> >> resolver coming from totally different angles.
> >>
> >> And yes, the resolver collector is not quite "extension" friendly, but
> we
> >> will make it right.
> >> Just FYI, that in the latest resolver (1.8.0) there are actually two
> >> implementations: depth-first (original) and depth-first.
> >>
> >> By looking at your code: collection is most critical regarding
> >> performance and memory in the resolver, so "hooking" into it (like
> sending
> >> events per each step) might not be the best, but still, what kind of
> >> extension points would you envision in the collector?
> >>
> >> For example, to achieve what you want, it would be completely enough to
> >> receive the final CollectResult (the full graph), no?
> >> As -- from a resolver perspective -- that would be simplest, especially
> >> that now we have two collector implementations...
> >>
> >> Also, in case of multi threading, your shared stack would not cut, would
> >> it?
> >>
> >> I personally was also looking into these, especially after some of the
> >> latest additions to resolver in 1.8.0 and current master....
> >>
> >>
> >> Thanks
> >> T
> >>
> >>
> >> On Thu, Apr 28, 2022 at 12:45 PM Grzegorz Grzybek <gr.grzy...@gmail.com
> >
> >> wrote:
> >>
> >>> Hello
> >>>
> >>> TL;DR: https://github.com/grgrzybek/tracking-maven-extension
> >>>
> >>> I'd like to share some proof of concept I made. It all started with a
> >>> question "why I'm getting log4j:log4j:1.2.12" in my local Maven
> >>> repository
> >>> when building trivial project with fresh local repo?
> >>>
> >>> I knew it's possible to `grep -r --include=*.pom 1.2.12` the poms that
> >>> declare old log4j, but I needed something better.
> >>>
> >>> In short words - I managed to persist the information available in
> >>>
> >>>
> org.eclipse.aether.internal.impl.collect.DefaultDependencyCollector.Args#nodes
> >>> stack.
> >>> I wrote a Maven extension that can be put into $MAVEN_HOME/lib/ext or
> >>> used
> >>> with "-Dmaven.ext.class.path" which does two things:
> >>>
> >>>    1. adds org.eclipse.aether.RepositoryListener component that writes
> >>> some
> >>>    information when a dependency is FIRST downloaded from remote
> >>> repository
> >>>    2. adds org.eclipse.aether.impl.DependencyCollector component
> >>> (extension
> >>>    of
> >>> org.eclipse.aether.internal.impl.collect.DefaultDependencyCollector)
> >>>    that writes some information when a dependency is resolved against
> >>> local
> >>>    repository when it's already there (where no download is needed)
> >>>
> >>> In the first case, I write something like this:
> >>>
> >>> ~~~
> >>> Downloaded artifact log4j:log4j:pom::1.2.12 (repository: central (
> >>> https://repo.maven.apache.org/maven2, default, releases))
> >>>    -> commons-logging:commons-logging:jar:1.1 (compile) (context:
> plugin)
> >>>      -> commons-digester:commons-digester:jar:1.8 (compile) (context:
> >>> plugin)
> >>>        -> org.apache.velocity:velocity-tools:jar:2.0 (compile)
> (context:
> >>> plugin)
> >>>          -> org.apache.maven.doxia:doxia-site-renderer:jar:1.11.1
> >>> (compile)
> >>> (context: plugin)
> >>>            -> org.apache.maven.plugins:maven-site-plugin:jar:3.11.0 ()
> >>> (context: plugin)
> >>>   Reading descriptor for artifact log4j:log4j:jar::1.2.12 (context:
> >>> plugin)
> >>> (scope: ?) (repository: central (https://repo.maven.apache.org/maven2,
> >>> default, releases))
> >>>     Transitive dependencies collection for
> >>> org.apache.maven.plugins:maven-site-plugin:jar:3.11.0 ()
> >>>       Resolution of plugin
> >>> org.apache.maven.plugins:maven-site-plugin:3.11.0
> (org.apache:apache:25)
> >>> ~~~
> >>> Downloaded artifact log4j:log4j:jar::1.2.12 (repository: central (
> >>> https://repo.maven.apache.org/maven2, default, releases))
> >>>   Resolution of plugin com.mycila:license-maven-plugin:3.0
> >>> (org.apache.camel:camel-buildtools:3.17.0-SNAPSHOT)
> >>>
> >>> I simply write some information from available
> >>> org.eclipse.aether.RepositoryEvent and event's
> >>> org.eclipse.aether.RequestTrace.
> >>>
> >>> More interesting information is written in 2nd case. Because I wanted
> to
> >>> track ALL attempts to resolve log4j:log4j:1.2.12 (and any other
> >>> dependency), I needed some structure. And I decided this:
> >>>
> >>>    - every dependency directory (where e.g., _remote.repositories is
> >>>    written along with the jar/pom/sha1/md5/...) gets ".tracking"
> >>> directory
> >>>    - in ".tracking" directory I write files with names of this pattern:
> >>>    "groupId_artifactId_type_classifier_version.dep", e.g.,
> >>>    org.apache.maven.plugins_maven-dependency-plugin_jar_3.1.2.dep
> >>>    - each such file contains a _reverse dependency tree_ that shows my
> >>> why
> >>>    given dependency was resolved.
> >>>
> >>> For example, in
> >>>
> >>>
> ~/.m2/repository/log4j/log4j/1.2.12/.tracking/org.apache.maven.plugins_maven-dependency-plugin_jar_3.1.2.dep
> >>> (the path itself already contains information that
> >>> org.apache.maven.plugins:maven-dependency-plugin:3.1.2 depends
> (directly
> >>> or
> >>> indirectly) in log4j:logj4:1.2.12.
> >>> The content of this file is:
> >>>
> >>> log4j:log4j:pom:1.2.12
> >>>  -> commons-logging:commons-logging:jar:1.1 (compile) (context: plugin)
> >>>    -> commons-digester:commons-digester:jar:1.8 (compile) (context:
> >>> plugin)
> >>>      -> org.apache.velocity:velocity-tools:jar:2.0 (compile) (context:
> >>> plugin)
> >>>        -> org.apache.maven.doxia:doxia-site-renderer:jar:1.7.4
> (compile)
> >>> (context: plugin)
> >>>          -> org.apache.maven.reporting:maven-reporting-impl:jar:3.0.0
> >>> (compile) (context: plugin)
> >>>            ->
> org.apache.maven.plugins:maven-dependency-plugin:jar:3.1.2
> >>> ()
> >>> (context: plugin)
> >>>
> >>> It's kind of obvious - dependency-plugin through maven-reporint-impl,
> >>> through doxia, velocity, commons-digester and commons-logging "depends"
> >>> on
> >>> malicious log4j:1.2.12 library every security scanner screams about.
> >>>
> >>> Since I wrote this extension, I keep it in my @MAVEN_HOME/lib/ext and
> >>> build
> >>> everything in my work. Now I know why my
> >>> ~/.m2/repository/org/codehaus/plexus/plexus-utils/ directory contains
> 57
> >>> different versions of plexus-utils for example. for example why 1.0.4
> >>> from
> >>> 2005?
> >>>
> >>> org.codehaus.plexus:plexus-utils:pom:1.0.4
> >>>  ->
> org.codehaus.plexus:plexus-container-default:jar:1.0-alpha-9-stable-1
> >>> (compile) (context: plugin)
> >>>    -> org.codehaus.plexus:plexus-velocity:jar:1.2 (compile) (context:
> >>> plugin)
> >>>      -> org.apache.maven.doxia:doxia-site-renderer:jar:1.11.1 (compile)
> >>> (context: plugin)
> >>>        -> org.apache.maven.plugins:maven-javadoc-plugin:jar:3.3.2 ()
> >>> (context: plugin)
> >>>
> >>> Why Guava 10.0.1?
> >>>
> >>> com.google.guava:guava:pom:10.0.1
> >>>  -> org.eclipse.sisu:org.eclipse.sisu.plexus:jar:0.0.0.M5 (compile)
> >>> (context: plugin)
> >>>    -> org.apache.maven:maven-plugin-api:jar:3.1.1 (compile) (context:
> >>> plugin)
> >>>      -> org.apache.maven:maven-core:jar:3.1.1 (compile) (context:
> plugin)
> >>>        ->
> org.apache.maven.shared:maven-common-artifact-filters:jar:3.2.0
> >>> (runtime) (context: plugin)
> >>>          ->
> org.springframework.boot:spring-boot-maven-plugin:jar:2.5.12
> >>> ()
> >>> (context: plugin)
> >>>
> >>> yes - Spring Boot 2.5.12...
> >>>
> >>> Why Log4j 2.10.0?
> >>>
> >>> org.apache.logging.log4j:log4j-api:pom:2.10.0
> >>>  -> org.apache.logging.log4j:log4j-to-slf4j:jar:2.10.0 (compile)
> >>> (context:
> >>> project)
> >>>    ->
> >>> org.springframework.boot:spring-boot-starter-logging:jar:2.0.5.RELEASE
> >>> (compile) (context: project)
> >>>      -> org.springframework.boot:spring-boot-starter:jar:2.0.5.RELEASE
> >>> (compile) (context: project)
> >>>        ->
> >>> org.springframework.boot:spring-boot-starter-web:jar:2.0.5.RELEASE
> >>> (compile) (context: project)
> >>>          -> org.keycloak:keycloak-spring-boot-2-adapter:jar:17.0.1
> >>> (context: project)
> >>>
> >>> (see - this time the context is "project", not "plugin").
> >>>
> >>> And so on and so on.
> >>>
> >>> What is my motivation with this email? I don't know yet - ideally I'd
> >>> like
> >>> to have this ".tracking" information created together with
> >>> "_remote.repositories" and "*.lastUpdated" metadata by Maven Resolver.
> It
> >>> could be optional of course (the overhead is really minimal - 1 more
> >>> minute
> >>> when building Camel 3 - 1 hour instead of 59 minutes).
> >>>
> >>> The only problem I had is that I had to fork/shade
> >>> org.eclipse.aether.internal.impl.collect.DefaultDependencyCollector
> class
> >>> because I had to manipulate
> >>>
> >>>
> org.eclipse.aether.internal.impl.collect.DefaultDependencyCollector.Args#nodes
> >>> stack around the call to
> >>>
> >>>
> org.jboss.fuse.mvnplugins.tracker.TrackingDependencyCollector#processDependency().
> >>> Besides this, normal plexus/sisu components are used.
> >>>
> >>> The repository is
> https://github.com/grgrzybek/tracking-maven-extension
> >>> and
> >>> I'd be happy to see some comments about this ;)
> >>>
> >>> kind regards
> >>> Grzegorz Grzybek
> >>>
> >>
>

Reply via email to