jira-importer commented on issue #495:
URL: https://github.com/apache/maven-indexer/issues/495#issuecomment-2965145517

   **[Tamas 
Cservenak](https://issues.apache.org/jira/secure/ViewProfile.jspa?name=cstamas)**
 commented
   
   For history' sake, here is the description of the problem:
   
   The JBoss Nexus introduces Audit information files stored as 
"artifactId-version.ext.audit.json". The problem is that these "metafiles" 
_violates_ the M2 repository layout in a way, that repositories with this Audit 
information now have _more than one main artifacts_. As proof, here is an 
example: In case of [Apache Avalon Framework 
4.1.5](https://repository.jboss.org/nexus/content/repositories/thirdparty-releases/apache-avalon/avalon-framework/4.1.5/),
 where _real_ main artifact is of type "jar", you can address both ("real" and 
"fake") main artifacts as this below, making Maven download and consume them:
   
   ```
   <dependency>
     <groupId>apache-avalon</groupId>
     <artifactId>avalon-framework</artifactId>
     <version>4.1.5</version>
     <type>jar</type>
   </dependency>
   ```
   
   Note: the "type" is redundant, but I added just for clarity sake ("JAR" is 
the default dependency type in Maven).
   
   Having this as dependency in a project, produces this expected build output:
   
   ```
   cstamas@marvin test$ mvn -s settings.xml clean install
   [INFO] Scanning for projects...
   [INFO]                                                                       
  
   [INFO] 
------------------------------------------------------------------------
   [INFO] Building test Maven Mojo 1.0+-SNAPSHOT
   [INFO] 
------------------------------------------------------------------------
   Downloading: 
https://repository.jboss.org/nexus/content/repositories/thirdparty-releases/apache-avalon/avalon-framework/4.1.5/avalon-framework-4.1.5.jar
   Downloaded: 
https://repository.jboss.org/nexus/content/repositories/thirdparty-releases/apache-avalon/avalon-framework/4.1.5/avalon-framework-4.1.5.jar
 (72 KB at 21.8 KB/sec)
   ....
   ```
   
   But alas, modifying the "type" node of the dependency, clearly shows that 
the "jar.audit.json" is treated the same way as the jar by Maven. Just modify 
in your "test" project the dependency, change it's type from "jar" to 
"jar.audit.json":
   
   ```
   <dependency>
     <groupId>apache-avalon</groupId>
     <artifactId>avalon-framework</artifactId>
     <version>4.1.5</version>
     <type>jar.audit.json</type>
   </dependency>
   ```
   
   And build it:
   
   ```
   cstamas@marvin test$ mvn -s settings.xml clean install
   [INFO] Scanning for projects...
   [INFO]                                                                       
  
   [INFO] 
------------------------------------------------------------------------
   [INFO] Building test Maven Mojo 1.0+-SNAPSHOT
   [INFO] 
------------------------------------------------------------------------
   Downloading: 
https://repository.jboss.org/nexus/content/repositories/thirdparty-releases/apache-avalon/avalon-framework/4.1.5/avalon-framework-4.1.5.jar.audit.json
   [WARNING] Checksum validation failed, no checksums available from the 
repository for 
https://repository.jboss.org/nexus/content/repositories/thirdparty-releases/apache-avalon/avalon-framework/4.1.5/avalon-framework-4.1.5.jar.audit.json
   Downloaded: 
https://repository.jboss.org/nexus/content/repositories/thirdparty-releases/apache-avalon/avalon-framework/4.1.5/avalon-framework-4.1.5.jar.audit.json
 (189 B at 0.1 KB/sec)
   ....
   ```
   
   This proves, that the GAV apache-avalon : avalon-framework : 4.1.5 in JBoss 
repository _has two main artifacts_ with different types.
   
   Note: same stands for POM, there is "artifactId-version.pom.audit.json" 
present also. This is important in discussion below.
   
   &mdash;
   
   Now, what this causes in Maven Indexer is next problem: Indexer expects 
following assumptions to be true: in GAV directory (the ["version 
directory"](https://repository.jboss.org/nexus/content/repositories/thirdparty-releases/apache-avalon/avalon-framework/4.1.5/)
 _it's expected to have one main artifact, and 0 or more classified (artifacts 
with classifier) artifacts_.
   
   What happens, is that Indexer finds multiple extensions ("pom.audit.json", 
"jar" and "jar.audit.json") for same GAV 
(apache-avalon:avalon-framework:4.1.5). Indexer also maintains _index 
uniqueness_ based on GAV. Hence, the _first "artifact suspect" file it stumbles 
upon_ becomes indexed, and the second is just skipped (uniqueness check fails 
on GAV). Here, the "pom.audit.json" (which is not ".pom", but neither 
".pom.sha1" or ".pom.md5" that are filtered out or "recognized" as artifact POM 
or checksums as part of M2 repository layout) is stumbled upon first, and is 
taken/considered as main artifact.
   
   Indexer _intentionally_ cannot take the POM -> artifact matching route, 
since _figuring out artifact extension_ out of POM "packaging" is not always 
possible (think 3rd party extensions, like "nexus-plugin" packaging is actually 
"jar" extension, etc). Hence, it goes the artifact -> POM route (which is 
trivial, strip off extension and replace it with ".pom"). This is one of the 
oldest limitations in Indexer code.
   
   But, the "heuristics" fails here, since GAV parser (that parses file's path 
to "reengineer" it's GAV) _succeeds_ in parsing the "pom.audit.json" file path 
resulting in GAV "apache-avalon:avalon-framework:4.1.5" and extension 
"pom.audit.json", and assumption is made _this is the main artifact_ (parsed 
GAV would contain classifier if classifier is present).
   
   Clearly, this is where Indexer fails, since in this very case, packaging set 
in POM is known ("jar"), but due to initial requirements when Indexer was 
implemented, this check (to "crank up the POM and parse it"), was sacrificed 
over _speed_ of scanning. The POM reading would still not offer "full 
solution", again, think non-core packagings and 3rd party build extensions.
   
   This again raises the general question: "What is the extension of packaging 
FOO?" -- to have answered without having ArtifactHandler access (this happens 
in Nexus, not in Maven). The other way ("What is the packaging of extension 
FOO?") is even more tricky and impossible, since packaging to extension mapping 
is not _bijective_ (ie. packaging "ear" and "nexus-plugin" both produce 
extension "jar", just like packaging "jar" is).
   
   Possible solutions:
   
   a) stop using "artifactId-version.ext.audit.json" as auditing JSON filename, 
but change it to something that "breaks" the M2 layout, like this 
".artifactId-version.ext.audit.json" (prepend with dot). This will make Indexer 
skip this file (will not be considered as artifact), but also Nexus will _hide 
it_ while browsing the repository (direct request to file will still work, only 
"browsing" the repository will not show it).
   
   b) Introduce some "skip it" extension in Indexer (this needs to be added to 
Maven Indexer), like ".noindex". If the filename _ends_ with exactly this 
string, Indexer should skip it, not consider it for indexing (so the Audit JSON 
filename would be "artifactId-version.ext.audit.json.noindex").
   
   c) move off to attributes to store audit information
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to