Re: a cleaned up central repository?

Brian Fox Thu, 01 Oct 2009 16:52:00 -0700

Albert,
Clearly you seem to have plenty of enthusiasm for helping to provide
better metadata, and I don't want to discourage that.

Providing and hosting a repository containing 90 thousand files that
serves greater than 12TB of data a month is not as easy as you might
imagine. Starting over with a new repository is not the answer here.

Getting 90k artifacts vetted and cleaned is a significant undertaking.
Frankly many of the artifacts in there are old, underused versions and
spending effort on those will have much less immediate impact than
stopping new artifacts getting in that have bad (or missing) data. We
have chosen to attack this problem by raising the barrier to entry for
new artifacts. This work started months ago and we'll be able to put
something in place in the next few weeks.

This will be done in an automated fashion, starting with artifacts
that are uploaded manually. Then we will apply the same rules
(automatically) to anything coming in via rsync. Besides improving and
vetting the data coming in, this more automated process is designed at
drastically improving the turnaround time to get new artifacts and
sync's configured.

If you really want to assist here, I can see several ways you
personally can assist this process:

1) Contribute _code_ that validates the various conditions you think
are important to validate. We already have an interface developed that
I can point you at if you're interested. These rules will help in many
ways because it will help check new artifacts, check old artifacts,
and allow people to use them with their repository manager internally
if they choose.

2) Provide a detailed list of artifacts and metadata you consider
broken. We can sit around and theorize about how things could be
better, but first having a concrete list of broken poms and other data
will help us focus on the most prominent problems first. The MEV
project in Jira is where we collect these. I don't care much if you
produce one jira or a jira with a huge list, it's the gathering of the
list that is most important.

3) When these artifacts are identified, work with the teams producing
these poms to educate them on the proper pom constructs to eliminate
them. Generally the teams don't produce bad data on purpose so some
education is required.

4) Assuming we have identified a significant number of the problems
from work done in #2, we would then need concrete proposals for how to
fix this without breaking people's builds. Proposals can be posted on
the MAVENUSER wiki space for futher discussion and refinement.

5) Assuming the proposals gather momentum, someone would need to code
the proposals

6) Then assistance would be needed to actually cleanup the metadata in
the context of the implementation.

Things get done around here by people actually stepping in to get them
done. You're enthusiastic and we'd be glad to accept help in the areas
above. I think further theorizing at this point is going to get
diminishing returns, and I personally think attempting to fork an
entire repository is not going to help the users as much as even
items #1,2,3 above.

Brian Fox
Apache Maven PMC Chair-Elect

On Thu, Oct 1, 2009 at 12:13 PM, Albert Kurucz <[email protected]> wrote:
> I would like to see some votes:
>
> 1. Big Rotten Onion
> 2. Starting Over After Writing New Policies
>
> On Thu, Oct 1, 2009 at 2:04 PM, Albert Kurucz <[email protected]> wrote:
>> You know where Option1 will drive us?
>> When the added metadata which hides current corruption will become
>> corrupt, we need another layer.
>> At the end, it will look like a big onion. :)
>>
>> Where will Option2 will get us?
>> The new repo will get corrupted again.
>> Unless the policy of repo-ing something will get rewritten, like this:
>> only source code can be uploaded in packages to a public repo.
>> Compile can only take place locally when you  are checking out
>> something or after (lazy checkout).
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: a cleaned up central repository?

Reply via email to