Howdy,

https://github.com/apache/maven-resolver/pull/176

So here is some implementation "demo" (that could be made into extension
point), as explained in Draft PR description.
BUT, also as written in PR, am getting a feeling that doing this is
"dangerous", and a simple callback with whole collected graph would be
better....


WDYT?

Tamas

On Mon, May 2, 2022 at 4:18 PM Tamás Cservenák <ta...@cservenak.net> wrote:

> Howdy,
>
> just a few short answers:
> - 1st: Personally, from a Resolver perspective, I'd just provide an API
> (basically the author extending resolver should implement) and make it
> simple to "click in" (Sisu component discovery).
> - 2nd: resolver IMHO should not provide any out of the box component
> implementation at all
>
> So 1st would provide a "stable" extension point for users who would like
> to "integrate" with resolver at this point (like you did), but it could
> become possible using simply this new API, instead the hoops and loops your
> code was forced to do (as resolver is quite "closed" in this respect).
>
> As for 2nd point, while I do like your idea of "decorating" local
> repository, I'd try a bit different route: I'd integrate this
> https://github.com/lambdazen/bitsy that makes possible to use Apache
> Tinkerpop's Gremlin queries to ask about the built graph for example...
>
> And one big remark: the collector is the "hottest point" in resolver (heap
> and cpu wise), so ANY "new API" implementation should be aware, that each
> "lost" millisecond directly affects resolver collection speed, but I think
> for "research kind" of stuff, of just "recording the process result" should
> fit in just fine. I don't see this as a "standard" feature of Maven, but
> who knows? :)
>
> Just my 5 cents...
>
> HTH
> Tamas
>
> On Mon, May 2, 2022 at 4:09 PM Grzegorz Grzybek <gr.grzy...@gmail.com>
> wrote:
>
>> Thank you Tamás for checking my experiment
>>
>> I'm just finishing my work before tomorrow's national holiday, but will
>> read your information more carefully soon.
>>
>> Whether it's DFS or BFS, as long as there's tracking from initial to
>> ultimate dependency, it's enough. DFS sounds more "natural" here though. I
>> didn't check the CollectResult class yet - is it created per dependency or
>> for the entire project?
>>
>> And yes - I didn't check multithreading, as in normal scenario (just `mvn
>> clean install`) I didn't observe concurrency issues accessing the stack.
>> Mind that I know a bit about maven "components", but there are definitely
>> few missing things in my understanding.
>>
>> Checking your output, I see there are two aspects of this potential
>> enhancement to the resolver:
>>  - 1st - how to effectively collect the "reverse dependency tree" in
>> context of DFS/BFS/multithreading
>>  - 2nd - how to write the information
>>
>> 2nd aspect could include:
>>  - whether there should be ".tracking" for each GAV directory in local
>> repo
>> (tracking for the purpose of entire local repository)
>>  - maybe there should be configurable output location for single report of
>> a build? (tracking for the purpose of single project)
>>  - which format to use (human consumable or machine readable?)
>>
>> For now I've used resolver 1.6.3 from Maven 3.8.5, but I'll look at `main`
>> branch too.
>>
>> kind regards
>> Grzegorz Grzybek
>>
>>
>> pon., 2 maj 2022 o 15:57 Tamás Cservenák <ta...@cservenak.net>
>> napisał(a):
>>
>> > What I missed to mention: in my case the trees in the gist are about
>> > "resolving maven-core 3.5.8", but I guess you figured it out from the
>> > tree....
>> >
>> > T
>> >
>> > On Mon, May 2, 2022 at 3:55 PM Tamás Cservenák <ta...@cservenak.net>
>> > wrote:
>> >
>> > > Howdy,
>> > >
>> > > I did some experiment, that (partially re-using your code to dump the
>> rev
>> > > tree) produces this output:
>> > > https://gist.github.com/cstamas/598a3266f943984442c00df30520294f
>> > >
>> > > (note: 1.8.0 resolver has two collector implementations: original
>> > > Depth-First and new Breadth-First called DF and BF respectively)
>> > >
>> > > The code is not pushed yet anywhere, but I plan to make an API for
>> this,
>> > > and as you can see, it works
>> > > for both implementations of collectors. Also, I hook ONLY into
>> collector,
>> > > as that's the place where the graph
>> > > is being built, but this is logically equivalent to your "More
>> > interesting
>> > > ... 2nd case".
>> > >
>> > > Will ping once again when I have the changes....
>> > >
>> > > Thanks
>> > > Tamas
>> > >
>> > > On Thu, Apr 28, 2022 at 9:01 PM Tamás Cservenák <ta...@cservenak.net>
>> > > wrote:
>> > >
>> > >> Howdy,
>> > >>
>> > >> This is very cool, I was actually tinkering on very similar issues in
>> > >> resolver coming from totally different angles.
>> > >>
>> > >> And yes, the resolver collector is not quite "extension" friendly,
>> but
>> > we
>> > >> will make it right.
>> > >> Just FYI, that in the latest resolver (1.8.0) there are actually two
>> > >> implementations: depth-first (original) and depth-first.
>> > >>
>> > >> By looking at your code: collection is most critical regarding
>> > >> performance and memory in the resolver, so "hooking" into it (like
>> > sending
>> > >> events per each step) might not be the best, but still, what kind of
>> > >> extension points would you envision in the collector?
>> > >>
>> > >> For example, to achieve what you want, it would be completely enough
>> to
>> > >> receive the final CollectResult (the full graph), no?
>> > >> As -- from a resolver perspective -- that would be simplest,
>> especially
>> > >> that now we have two collector implementations...
>> > >>
>> > >> Also, in case of multi threading, your shared stack would not cut,
>> would
>> > >> it?
>> > >>
>> > >> I personally was also looking into these, especially after some of
>> the
>> > >> latest additions to resolver in 1.8.0 and current master....
>> > >>
>> > >>
>> > >> Thanks
>> > >> T
>> > >>
>> > >>
>> > >> On Thu, Apr 28, 2022 at 12:45 PM Grzegorz Grzybek <
>> gr.grzy...@gmail.com
>> > >
>> > >> wrote:
>> > >>
>> > >>> Hello
>> > >>>
>> > >>> TL;DR: https://github.com/grgrzybek/tracking-maven-extension
>> > >>>
>> > >>> I'd like to share some proof of concept I made. It all started with
>> a
>> > >>> question "why I'm getting log4j:log4j:1.2.12" in my local Maven
>> > >>> repository
>> > >>> when building trivial project with fresh local repo?
>> > >>>
>> > >>> I knew it's possible to `grep -r --include=*.pom 1.2.12` the poms
>> that
>> > >>> declare old log4j, but I needed something better.
>> > >>>
>> > >>> In short words - I managed to persist the information available in
>> > >>>
>> > >>>
>> >
>> org.eclipse.aether.internal.impl.collect.DefaultDependencyCollector.Args#nodes
>> > >>> stack.
>> > >>> I wrote a Maven extension that can be put into $MAVEN_HOME/lib/ext
>> or
>> > >>> used
>> > >>> with "-Dmaven.ext.class.path" which does two things:
>> > >>>
>> > >>>    1. adds org.eclipse.aether.RepositoryListener component that
>> writes
>> > >>> some
>> > >>>    information when a dependency is FIRST downloaded from remote
>> > >>> repository
>> > >>>    2. adds org.eclipse.aether.impl.DependencyCollector component
>> > >>> (extension
>> > >>>    of
>> > >>> org.eclipse.aether.internal.impl.collect.DefaultDependencyCollector)
>> > >>>    that writes some information when a dependency is resolved
>> against
>> > >>> local
>> > >>>    repository when it's already there (where no download is needed)
>> > >>>
>> > >>> In the first case, I write something like this:
>> > >>>
>> > >>> ~~~
>> > >>> Downloaded artifact log4j:log4j:pom::1.2.12 (repository: central (
>> > >>> https://repo.maven.apache.org/maven2, default, releases))
>> > >>>    -> commons-logging:commons-logging:jar:1.1 (compile) (context:
>> > plugin)
>> > >>>      -> commons-digester:commons-digester:jar:1.8 (compile)
>> (context:
>> > >>> plugin)
>> > >>>        -> org.apache.velocity:velocity-tools:jar:2.0 (compile)
>> > (context:
>> > >>> plugin)
>> > >>>          -> org.apache.maven.doxia:doxia-site-renderer:jar:1.11.1
>> > >>> (compile)
>> > >>> (context: plugin)
>> > >>>            -> org.apache.maven.plugins:maven-site-plugin:jar:3.11.0
>> ()
>> > >>> (context: plugin)
>> > >>>   Reading descriptor for artifact log4j:log4j:jar::1.2.12 (context:
>> > >>> plugin)
>> > >>> (scope: ?) (repository: central (
>> https://repo.maven.apache.org/maven2,
>> > >>> default, releases))
>> > >>>     Transitive dependencies collection for
>> > >>> org.apache.maven.plugins:maven-site-plugin:jar:3.11.0 ()
>> > >>>       Resolution of plugin
>> > >>> org.apache.maven.plugins:maven-site-plugin:3.11.0
>> > (org.apache:apache:25)
>> > >>> ~~~
>> > >>> Downloaded artifact log4j:log4j:jar::1.2.12 (repository: central (
>> > >>> https://repo.maven.apache.org/maven2, default, releases))
>> > >>>   Resolution of plugin com.mycila:license-maven-plugin:3.0
>> > >>> (org.apache.camel:camel-buildtools:3.17.0-SNAPSHOT)
>> > >>>
>> > >>> I simply write some information from available
>> > >>> org.eclipse.aether.RepositoryEvent and event's
>> > >>> org.eclipse.aether.RequestTrace.
>> > >>>
>> > >>> More interesting information is written in 2nd case. Because I
>> wanted
>> > to
>> > >>> track ALL attempts to resolve log4j:log4j:1.2.12 (and any other
>> > >>> dependency), I needed some structure. And I decided this:
>> > >>>
>> > >>>    - every dependency directory (where e.g., _remote.repositories is
>> > >>>    written along with the jar/pom/sha1/md5/...) gets ".tracking"
>> > >>> directory
>> > >>>    - in ".tracking" directory I write files with names of this
>> pattern:
>> > >>>    "groupId_artifactId_type_classifier_version.dep", e.g.,
>> > >>>    org.apache.maven.plugins_maven-dependency-plugin_jar_3.1.2.dep
>> > >>>    - each such file contains a _reverse dependency tree_ that shows
>> my
>> > >>> why
>> > >>>    given dependency was resolved.
>> > >>>
>> > >>> For example, in
>> > >>>
>> > >>>
>> >
>> ~/.m2/repository/log4j/log4j/1.2.12/.tracking/org.apache.maven.plugins_maven-dependency-plugin_jar_3.1.2.dep
>> > >>> (the path itself already contains information that
>> > >>> org.apache.maven.plugins:maven-dependency-plugin:3.1.2 depends
>> > (directly
>> > >>> or
>> > >>> indirectly) in log4j:logj4:1.2.12.
>> > >>> The content of this file is:
>> > >>>
>> > >>> log4j:log4j:pom:1.2.12
>> > >>>  -> commons-logging:commons-logging:jar:1.1 (compile) (context:
>> plugin)
>> > >>>    -> commons-digester:commons-digester:jar:1.8 (compile) (context:
>> > >>> plugin)
>> > >>>      -> org.apache.velocity:velocity-tools:jar:2.0 (compile)
>> (context:
>> > >>> plugin)
>> > >>>        -> org.apache.maven.doxia:doxia-site-renderer:jar:1.7.4
>> > (compile)
>> > >>> (context: plugin)
>> > >>>          ->
>> org.apache.maven.reporting:maven-reporting-impl:jar:3.0.0
>> > >>> (compile) (context: plugin)
>> > >>>            ->
>> > org.apache.maven.plugins:maven-dependency-plugin:jar:3.1.2
>> > >>> ()
>> > >>> (context: plugin)
>> > >>>
>> > >>> It's kind of obvious - dependency-plugin through
>> maven-reporint-impl,
>> > >>> through doxia, velocity, commons-digester and commons-logging
>> "depends"
>> > >>> on
>> > >>> malicious log4j:1.2.12 library every security scanner screams about.
>> > >>>
>> > >>> Since I wrote this extension, I keep it in my @MAVEN_HOME/lib/ext
>> and
>> > >>> build
>> > >>> everything in my work. Now I know why my
>> > >>> ~/.m2/repository/org/codehaus/plexus/plexus-utils/ directory
>> contains
>> > 57
>> > >>> different versions of plexus-utils for example. for example why
>> 1.0.4
>> > >>> from
>> > >>> 2005?
>> > >>>
>> > >>> org.codehaus.plexus:plexus-utils:pom:1.0.4
>> > >>>  ->
>> > org.codehaus.plexus:plexus-container-default:jar:1.0-alpha-9-stable-1
>> > >>> (compile) (context: plugin)
>> > >>>    -> org.codehaus.plexus:plexus-velocity:jar:1.2 (compile)
>> (context:
>> > >>> plugin)
>> > >>>      -> org.apache.maven.doxia:doxia-site-renderer:jar:1.11.1
>> (compile)
>> > >>> (context: plugin)
>> > >>>        -> org.apache.maven.plugins:maven-javadoc-plugin:jar:3.3.2 ()
>> > >>> (context: plugin)
>> > >>>
>> > >>> Why Guava 10.0.1?
>> > >>>
>> > >>> com.google.guava:guava:pom:10.0.1
>> > >>>  -> org.eclipse.sisu:org.eclipse.sisu.plexus:jar:0.0.0.M5 (compile)
>> > >>> (context: plugin)
>> > >>>    -> org.apache.maven:maven-plugin-api:jar:3.1.1 (compile)
>> (context:
>> > >>> plugin)
>> > >>>      -> org.apache.maven:maven-core:jar:3.1.1 (compile) (context:
>> > plugin)
>> > >>>        ->
>> > org.apache.maven.shared:maven-common-artifact-filters:jar:3.2.0
>> > >>> (runtime) (context: plugin)
>> > >>>          ->
>> > org.springframework.boot:spring-boot-maven-plugin:jar:2.5.12
>> > >>> ()
>> > >>> (context: plugin)
>> > >>>
>> > >>> yes - Spring Boot 2.5.12...
>> > >>>
>> > >>> Why Log4j 2.10.0?
>> > >>>
>> > >>> org.apache.logging.log4j:log4j-api:pom:2.10.0
>> > >>>  -> org.apache.logging.log4j:log4j-to-slf4j:jar:2.10.0 (compile)
>> > >>> (context:
>> > >>> project)
>> > >>>    ->
>> > >>>
>> org.springframework.boot:spring-boot-starter-logging:jar:2.0.5.RELEASE
>> > >>> (compile) (context: project)
>> > >>>      ->
>> org.springframework.boot:spring-boot-starter:jar:2.0.5.RELEASE
>> > >>> (compile) (context: project)
>> > >>>        ->
>> > >>> org.springframework.boot:spring-boot-starter-web:jar:2.0.5.RELEASE
>> > >>> (compile) (context: project)
>> > >>>          -> org.keycloak:keycloak-spring-boot-2-adapter:jar:17.0.1
>> > >>> (context: project)
>> > >>>
>> > >>> (see - this time the context is "project", not "plugin").
>> > >>>
>> > >>> And so on and so on.
>> > >>>
>> > >>> What is my motivation with this email? I don't know yet - ideally
>> I'd
>> > >>> like
>> > >>> to have this ".tracking" information created together with
>> > >>> "_remote.repositories" and "*.lastUpdated" metadata by Maven
>> Resolver.
>> > It
>> > >>> could be optional of course (the overhead is really minimal - 1 more
>> > >>> minute
>> > >>> when building Camel 3 - 1 hour instead of 59 minutes).
>> > >>>
>> > >>> The only problem I had is that I had to fork/shade
>> > >>> org.eclipse.aether.internal.impl.collect.DefaultDependencyCollector
>> > class
>> > >>> because I had to manipulate
>> > >>>
>> > >>>
>> >
>> org.eclipse.aether.internal.impl.collect.DefaultDependencyCollector.Args#nodes
>> > >>> stack around the call to
>> > >>>
>> > >>>
>> >
>> org.jboss.fuse.mvnplugins.tracker.TrackingDependencyCollector#processDependency().
>> > >>> Besides this, normal plexus/sisu components are used.
>> > >>>
>> > >>> The repository is
>> > https://github.com/grgrzybek/tracking-maven-extension
>> > >>> and
>> > >>> I'd be happy to see some comments about this ;)
>> > >>>
>> > >>> kind regards
>> > >>> Grzegorz Grzybek
>> > >>>
>> > >>
>> >
>>
>

Reply via email to