-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 30/11/12 11:15 AM, Michael Mol wrote:
> On Fri, Nov 30, 2012 at 10:57 AM, Richard Yao <r...@gentoo.org>
> wrote:
>> On 11/28/2012 11:08 AM, Matthew Thode wrote:
>>> On 11/28/2012 09:05 AM, Richard Yao wrote:
>>>> On 11/28/2012 09:17 AM, Maxim Kammerer wrote:
>>>>> On Wed, Nov 28, 2012 at 3:54 PM, Richard Yao
>>>>> <r...@gentoo.org> wrote:
>>>>>> We could slightly simplify the handbook installation
>>>>>> procedure if we told people to use emerge-webrsync to
>>>>>> fetch the initial snapshot.
>>>>> 
>>>>> Using emerge-webrsync also makes the installation process
>>>>> more robust, since it only requires HTTP access (whereas
>>>>> many firewalls restrict RSYNC). Besides, emerge-webrsync
>>>>> can check PGP signatures, so I think that it should be the
>>>>> primary recommended portage tree synchronization method.
>>>>> 
>>>> 
>>>> The only downside of which I am aware is increased network
>>>> traffic. However, we could redesign emerge-webrsync to take
>>>> advantage of GNU Tar's incremental archive functionality.
>>>> 
>>>> That would permit us to mirror compressed diffs in addition
>>>> to regular portage snapshots. Doing that well could reduce
>>>> bandwidth requirements.
>>>> 
>>> weekly fulls and daily diffs?
>>> 
>> 
>> Determining what is right here probably requires calculus, but
>> this scheme does not seem like a bad choice to me. My main
>> concern is that maintaining weekly full snapshots would require
>> too much space for the mirrors. It might be better go monthly,
>> with diffs on the following intervals:
>> 
>> 1 week 1 day 30 minutes
>> 
>> Doing that would eliminate the benefit of rsync entirely, with
>> the caveat that we now need to mirror a ton of diffs. This would
>> make it easy for us to provide the ability to obtain historical
>> snapshots, which would be nice.
> 
> Worth noting that all this moves us nicely in the direction of 
> allowing HTTP proxies to cache data, reducing load on mirrors. And 
> moves us in the direction of implementing mirrors themselves as a 
> network of caching proxies.
> 

Idea makes sense, I wonder if implementation would be better served by
leveraging the fact that we already produce daily full snapshots:

1 - continue to provide the daily snapshots we do now
2 - provide two weeks (more?) of daily diffs, such that a daily
snapshot from up to two weeks ago can be updated to present day
3 - provide hourly or 30-minute update diffs to get latest changes.

If the tree is more than two weeks old, emerge-webrsync would just
grab the latest daily plus the hourly diffs.

If the tree is less than two weeks old, grab the daily diffs and
hourly diffs.  The local copy of the tree itself would need to be
rolled back to the best-available daily diff before these diff updates
could be applied; this may mean that a local cache of the latest
full-day snapshot needs to be kept and/or generated.  Also if said
cache doesn't exist, then the whole full-day snapshot would be grabbed.

The advantage to this would be significantly fewer distfiles, although
the logic in emerge-webrsync would possibly be more complex.

Regarding rolling back the local tree to a known-good state, I think
that would be required regardless of the method as any local changes
made to the tree by users would need to be discarded, right?

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)

iF4EAREIAAYFAlC45f0ACgkQ2ugaI38ACPChKgD9GOBptQ9jJ1/eYyq1NEl5Oq1E
dVy9UOab80bG5FZB9LwBAKwsifnT+iE3n/4d/ljnuT2qCnbtXNYr7yBjF/VcEpkq
=y9eB
-----END PGP SIGNATURE-----

Reply via email to