[ 
https://issues.apache.org/jira/browse/IMPALA-10455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17317573#comment-17317573
 ] 

ASF subversion and git services commented on IMPALA-10455:
----------------------------------------------------------

Commit 267f4d67f4f9c8b10af539f8f2e0a2abfa4bafd5 in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=267f4d6 ]

IMPALA-10455: Reorder Maven repositories for cleaner mirror semantics

When using a Maven mirror that uses a mirrorOf pattern, the order
of repositories in the pom.xml has a strong influence on whether the
build tries the mirror for a particular artifact. If an early
repository matches the mirrorOf condition, Maven may try the mirror
for all artifacts, even those that only exist in the s3 bucket.
This extra check can slow down the build, especially if the mirror
is slow to respond for unknown artifacts.

For Impala, the common case is for a mirror to cover everything
except the artifacts that come from the Kudu local repository or
the s3 bucket. To optimize for that case, this reorders the Maven
repositories to be in this order:
1. Local/S3 repositories
2. Regular repositories
3. Banned repositories
The repositories are otherwise unchanged.

Testing:
 - Ran an ordinary build
 - Ran a build with a mirrorOf "external:*,!impala.cdp.repo" and verified
   that the build went directly to the s3 bucket first.

Change-Id: I7046c7ec5391833e98ee6a463fb8c08b6a04cb26
Reviewed-on: http://gerrit.cloudera.org:8080/17020
Reviewed-by: Joe McDonnell <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Reorder Maven repositories to have cleaner mirror semantics
> -----------------------------------------------------------
>
>                 Key: IMPALA-10455
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10455
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend, Infrastructure
>    Affects Versions: Impala 4.0
>            Reporter: Joe McDonnell
>            Assignee: Joe McDonnell
>            Priority: Major
>
> Using a Maven mirror to replace Maven Central can speed up the Impala build 
> substantially. However, the artifacts that are present in the toolchain s3 
> bucket are unlikely to be able to resolved by the mirror, because they are 
> not in Maven Central or other repositories. If the Maven mirror has a long 
> list of source repositories, a miss can be expensive, because it may try each 
> of the mirror's source repositories. It would be useful to exclude the s3 
> bucket Maven repositories from the mirroring. For example, this settings.xml 
> would do that:
> {noformat}
> <settings>
>   <mirrors>
>     <mirror>
>       <mirrorOf>external:*,!impala.cdp.repo</mirrorOf>
>       <name>mirror-repo</name>
>       <url>http://url.to.the.mirror/</url>
>       <id>mirror-repo</id>
>     </mirror>
>   </mirrors>
> </settings>{noformat}
> It mirrors everything that is not local and not from impala.cdp.repo (which 
> points to an S3 bucket).
> Unfortunately, this rule doesn't work. Everything still tries the mirror. 
> Maven is trying repositories in the order that they are specified in the 
> pom.xml, and it sees cdh.rcs.releases.repo before it sees impala.cdp.repo ( 
> [https://github.com/apache/impala/blob/master/java/pom.xml#L150 
> ).|https://github.com/apache/impala/blob/master/java/pom.xml#L150)] It also 
> sees multiple banned repos (i.e. repos where both snapshots and releases are 
> disabled). Based on my testing, seeing the cdh.rcs.releases.repo causes it to 
> try the mirror, because it matches the mirrorOf conditions. It seems like the 
> banned repositories may also a problem, depending on how smart Maven is.
> Reordering the repositories can fix these semantics. If the impala.cdp.repo 
> comes first (along with the impala.toolchain.kudu.repo), then anything that 
> matches that would avoid hitting the mirror. Specifically, it seems like the 
> best ordering would be impala.toolchain.kudu.repo (a local filesystem repo), 
> impala.cdp.repo (an s3 repo), then the normal server repos, and lastly the 
> banned repositories.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to