[ 
https://issues.apache.org/jira/browse/IMPALA-10455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell reassigned IMPALA-10455:
--------------------------------------

    Assignee: Joe McDonnell

> Reorder Maven repositories to have cleaner mirror semantics
> -----------------------------------------------------------
>
>                 Key: IMPALA-10455
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10455
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend, Infrastructure
>    Affects Versions: Impala 4.0
>            Reporter: Joe McDonnell
>            Assignee: Joe McDonnell
>            Priority: Major
>
> Using a Maven mirror to replace Maven Central can speed up the Impala build 
> substantially. However, the artifacts that are present in the toolchain s3 
> bucket are unlikely to be able to resolved by the mirror, because they are 
> not in Maven Central or other repositories. If the Maven mirror has a long 
> list of source repositories, a miss can be expensive, because it may try each 
> of the mirror's source repositories. It would be useful to exclude the s3 
> bucket Maven repositories from the mirroring. For example, this settings.xml 
> would do that:
> {noformat}
> <settings>
>   <mirrors>
>     <mirror>
>       <mirrorOf>external:*,!impala.cdp.repo</mirrorOf>
>       <name>mirror-repo</name>
>       <url>http://url.to.the.mirror/</url>
>       <id>mirror-repo</id>
>     </mirror>
>   </mirrors>
> </settings>{noformat}
> It mirrors everything that is not local and not from impala.cdp.repo (which 
> points to an S3 bucket).
> Unfortunately, this rule doesn't work. Everything still tries the mirror. 
> Maven is trying repositories in the order that they are specified in the 
> pom.xml, and it sees cdh.rcs.releases.repo before it sees impala.cdp.repo ( 
> [https://github.com/apache/impala/blob/master/java/pom.xml#L150 
> ).|https://github.com/apache/impala/blob/master/java/pom.xml#L150)] It also 
> sees multiple banned repos (i.e. repos where both snapshots and releases are 
> disabled). Based on my testing, seeing the cdh.rcs.releases.repo causes it to 
> try the mirror, because it matches the mirrorOf conditions. It seems like the 
> banned repositories may also a problem, depending on how smart Maven is.
> Reordering the repositories can fix these semantics. If the impala.cdp.repo 
> comes first (along with the impala.toolchain.kudu.repo), then anything that 
> matches that would avoid hitting the mirror. Specifically, it seems like the 
> best ordering would be impala.toolchain.kudu.repo (a local filesystem repo), 
> impala.cdp.repo (an s3 repo), then the normal server repos, and lastly the 
> banned repositories.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to