[jira] [Commented] (MNG-7001) Reconsider seemingly useless check of artifacts' source repository introduced in Maven 3.0

Tamas Cservenak (Jira) Tue, 28 Feb 2023 01:28:06 -0800


    [ 
https://issues.apache.org/jira/browse/MNG-7001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694457#comment-17694457
 ]


Tamas Cservenak commented on MNG-7001:
--------------------------------------

I strongly disagree with removing (or making it disabled by default) of this 
feature, especially as more advanced features (split repository, remote 
repository filtering) relies on availability information very much, and a 
healthy development environment should not, should never rely on contents of LM 
("what i have in local repository"), is is just a cache, like a used napkin, 
with all the breadcrumbs from your breakfast and yesterday lunch. To repeat 
again, the ultimate check of build correctness is to nuke your local 
repository, and ensure it passes OK. And if it passes with empty local 
repository, it will pass with prepopulated one as well.

About artifact identity: yes, the identity of an artifact is not GAV, it is 
actually RGAV (the repository being part of it), as there is no any kind of 
"global artifact police" that would prevent overlapping GAVs being present in 
different repositories. So, talking about GAV makes sense only if you mention 
it's origin as well. This will become more and more prevalent, as Maven Central 
loose it's "central" position, that is somewhat already happening: more and 
more forges emerge (atlassian, redhat, spring, former JCentral, etc) with their 
own repositories, and actually this is good.

On related note, here is an interesting reading (and look at the comments as 
well): [https://www.morling.dev/blog/maven-what-are-you-waiting-for/]

Maven was historically a bit "hard" with projects that were using dependencies 
not solely on Central (so build had to contain multiple remote repositories) 
and this is about to change, we are working on it. For example, with Remote 
Repository Filter (RRF) feature you can vastly improve your build times, and 
make Maven be precisely directed toward needed remote repository, instead to 
let it "sift thru" all repositories to get the artifact. Using RRF you can 
Maven behave as fast as it would be only one repository used (the Central).

More reading here https://github.com/cstamas/rrf-demo

All the problems are even enlarged with use of repository manager (RM) groups 
(virtual repo) features, and the "recommended" Maven settings mirrorOf:*. What 
happens, is that this completely overwrites the origin information in your 
local repository, and Maven has no idea about it. Also, it silently assumes 
there are NO overlapping artifacts out there, but nobody talks about this 
aspect. What happens under the hood here, that even if your build contains 
definitions for "central", "atlassian", "redhat", etc, the mirrorOf setting 
overrides all these, and will make all these originate from your single 
mirrorOf repository ID (hence overlapping artifacts cannot coexist, as all of 
them are coalesced into single origin). In short, repository groups with 
recommended mirrorOf settings are to be avoided, instead let Maven be aware of 
your proxy and hosted repositories (enumerate them in POM), apply some solution 
to properly set URLs (ie. in a settings.xml profile) to make project 
"transportable" (ie. CI vs office workstation vs home laptop on VPN).

Usually what happens with builds like these in large(er) companies is that 
Maven looses information, or in other words, the "build comprehension" in Maven 
lacks things: a developer needs a new artifact, that alas, is not available 
from existing public group, so he informs the RM team about this ("add a proxy 
to repo X please"). The RM team, unaware of developers project adds proxy to X 
and stash it into public group. Next what happens that CI or developer happily 
builds the project, as Maven can get the new artifact from the group. But the 
information about the origin is lost on Maven side. Builds like these 
completely lack information about remote repositories they need, just take a 
project like that from company environment (for example company open sources 
it), and try to build it where there is no RM public group accessible -> is 
deemed to fail. To make it work, developers need to "reverse engineer" what 
comes from where, and make up the missing remote repository informations.

Moreover, if you consume Atlassian forge and follow their doco how to consume 
it, there is another problem: Atlassian actually does not publish their 
"hosted" repository (where their produced artifacts are hosted) but a group. 
Moreover, that group of theirs contains Maven Central proxy as well. Hence, in 
a moment you declare Atlassian repository in your build, you immediately get 
Maven Central present twice, and you can get slf4j-api from Maven Central but 
also Atlassian. Now, given the discussion about this above, they are NOT 
CONSIDERED SAME. For example, split repository feature would physically put 
them in different places (for this to happen you would need to reverse the repo 
order like rrf-demo project does, but still).

Here, another new feature could help to assure they are "same" (as by 
coordinate RGAV they are not): the "trusted checksum source", that is able to 
verify (even by using "strong" checksum as SHA-2) that the artifact, regardless 
of origin, has same checksum (so to prevent spoofing).

In short, what "availability check" does is comparing the set of project 
defined remote repositories and the set of origin repositories of artifact in 
local repository, and ensuring there is an intersection between the two. If 
there is none, the artifact is "not available" (created MRESOLVER-333). Effect 
is same, as building with empty local repository: build failure as Maven would 
be unable to resolve given artifact (but it has to try to be sure about it).

All that above said, there is one big (merely technical) issue about all of 
this: the R component of RGAV artifact identity, as it is the repository ID. 
Some may call a remote repository "atlassian", or "atlassian-forge" or 
"Atlassian-repo", etc. In controlled environment (company, or some project 
umbrella with several related projects) this should not be a problem, hopefully 
the build managers can agree on IDs and align them. So, with solution applied, 
this should work transparently for them. But in "wild" (ie. you build several 
unrelated OSS projects), there are usually always differences found, so worst 
happens in this case is Maven does remote access to ensure that current project 
should have it available.

 

> Reconsider seemingly useless check of artifacts' source repository introduced 
> in Maven 3.0
> ------------------------------------------------------------------------------------------
>
>                 Key: MNG-7001
>                 URL: https://issues.apache.org/jira/browse/MNG-7001
>             Project: Maven
>          Issue Type: Improvement
>    Affects Versions: 3.0, 3.1.1, 3.2.5, 3.3.9, 3.5.4, 3.6.3
>            Reporter: Petr Bodnar
>            Priority: Major
>
> This problem of "by-nobody-really-requested check for artifacts' source 
> repository" (just "repo-check" further on) is actually considered a bug by 
> many Maven users. It was introduced back in Maven 3.0, 10 years ago \(!). The 
> repo-check and its _practical_ disadvantages have been already thoroughly 
> described for example in my blog 
> [here|https://programmedbycoincidence.blogspot.com/2019/01/the-biggest-wtf-new-feature-ive-ever.html]
>  and discussed here within Jira: MNG-5181, MNG-5185, MNG-5289 and MNG-5883.
> *TL;DR What is requested in this issue:*
> # Remove the repo-check altogether.
> # If that's not possible, make the repo-check disabled by-default and have an 
> option to enable it for those who need it for whatever reason.
> # If even that is not possible, alter Maven and its warnings and errors so 
> that they do not confuse users.
> # Reason about the need for the repo-check, document the reasons.
> ----
> The repo-check can be _somewhat_ avoided by passing the {{-llr}} option to 
> Maven. AFAIK though, e. g. Eclipse's embedded Maven used for dependency 
> resolution doesn't support this option. Another long-outstanding issue is 
> that using the {{-llr}} option generates this warning on Maven build:
> {noformat}
> [WARNING] Disabling enhanced local repository: using legacy is strongly 
> discouraged to ensure build reproducibility.
> {noformat}
> Generally it might make sense (possibly because of activating some quite 
> another old part of Maven that, apart from other things, doesn't mark down 
> the artifacts' sources to "\*.repositories" files?). But when users have _no 
> other option_ that could be used for making their build reproducible by 
> skipping the repo-check, then the warning doesn't make sense to them. The 
> only other choice they have is to remove all those "\*.repositories" files 
> from their local Maven repository in order to make their builds work again.
> Another mind-blowing issue is described in MNG-5185: If an already-downloaded 
> artifact doesn't go through the hard-coded repo-check, Maven just tells the 
> user "the artifact could not be resolved". _But you'll get the very same 
> message when downloading an artifact really fails._ So unless you dig in, 
> these two totally different situations are not distinguishable from each 
> other.
> ----
> Yet to date, no action was taken by Maven authors to help with any of the 
> problems. There is also no really good (read "making-sense-in-real-life") 
> explanation of real pros of the introduced repo-check, that would out-weight 
> its cons, other than for example:
> {quote}The artifacts have an identity. It matters where the artifacts were 
> downloaded from. Artifact A downloaded from X is not the same thing to Maven 
> 3 as A downloaded from Y. This can happen when you flip your settings.xml to 
> go from using a repository manager to using Maven Central directly for 
> example.
> {quote}
> (taken from MNG-5289 comment)
> The logical question here is, to whom concretely "it matters"? Please, give 
> examples of what could go wrong if one has downloaded a released version of 
> an artifact and now its source repository changes or becomes unavailable.
> Please note that we shouldn't consider the very improbable case of artifacts 
> downloaded from various repositories would have different content even though 
> having the very same GAV. The Maven's local repository filesystem structure 
> is not able to cope with that situation anyway, or is it?
> Finally, there is also a performance-wise con of the repo-check - Maven needs 
> to contact the source repository every time it builds a project referencing 
> the checked artifact as one of its dependencies. Or doesn't it?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (MNG-7001) Reconsider seemingly useless check of artifacts' source repository introduced in Maven 3.0

Reply via email to