[
https://issues.apache.org/jira/browse/HADOOP-18786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17737886#comment-17737886
]
ASF GitHub Bot commented on HADOOP-18786:
-----------------------------------------
ctubbsii commented on PR #5789:
URL: https://github.com/apache/hadoop/pull/5789#issuecomment-1610266483
> Also, I guess we want to this change in all active branches? i.e.
branch-3.3 branch-3.2 and branch-2.10.
As a stop-gap, it would be good to prevent any downstream users from hitting
this on all new releases, as it can trigger a site-wide ban if they've used the
archives too much. Building Hadoop releases from source shouldn't cause users
to be banned from access to the ASF.
But, this is only a stop-gap solution, regardless of where it is applied.
Once these dependencies roll over to the archives, then you'll have the problem
of users being unable to build Hadoop releases from source without first
patching its build in some way. So, a more complete solution still needs to be
created.
The Dockerfile changes are probably okay (as I imagine those are optional,
and users can derive their own Dockerfiles easily enough from these as
reference), and the generic message about JVSC is certainly okay. You could
probably get away with just applying those changes to the trunk.
The main problem to address across all branches is the yetus-wrapper's use
of the archives. Perhaps one of the following would work?
1. yetus can be made an optional part of the build (a dev-only profile that
is inactive by default when users build from source)?
2. Or you can bundle yetus into the release as a build tool so it doesn't
need to go to the archives (might be against ASF policy for source releases,
but perhaps there's an exception to the rule)?
3. Or perhaps yetus is stable enough that you can just reference
downloads.apache.org/yetus/latest instead of a specific version?
4. Or maybe the build instructions should just tell the user that they need
to download or install it as a build prerequisite, rather than have the Hadoop
scripts download it?
5. Or perhaps Yetus can publish its releases to Maven Central or another
place, from which these can be downloaded?
I'm not sure what the best solution is, but I definitely think this part
should be fixed in all branches, somehow.
> Hadoop build depends on archives.apache.org
> -------------------------------------------
>
> Key: HADOOP-18786
> URL: https://issues.apache.org/jira/browse/HADOOP-18786
> Project: Hadoop Common
> Issue Type: Bug
> Components: build
> Affects Versions: 3.3.6
> Reporter: Christopher Tubbs
> Priority: Critical
> Labels: pull-request-available
>
> Several times throughout Hadoop's source, the ASF archive is referenced,
> including part of the build that downloads Yetus.
> Building a release from source should not require access to the ASF archives,
> as that contributes to end users being subject to throttling and blocking by
> INFRA, for "abuse" of the archives, even though they are merely building a
> current ASF release from source. This is particularly problematic for
> downstream packagers who must build from Hadoop's source, or for CI/CD
> situations that depend on Hadoop's source, and particularly problematic for
> those end users behind a NAT gateway, because even if Hadoop's use of the
> archive is modest, it adds up for multiple users.
> The build should be modified, so that it does not require access to fixed
> versions in the archives (or should work with the upstream of those dependent
> projects to publish their releases elsewhere, for routine consumptions). In
> the interim, the source could be updated to point to the current dependency
> versions available on downloads.apache.org.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]