Hi Per,

Am 04.06.2012 um 14:23 schrieb Per Jessen:

> Anthony Bryan wrote:
> 
>> On Sat, Jun 2, 2012 at 4:12 AM, Jack Bates <grx...@nottheoilrig.com>
>> wrote:
>>> 
>>>> When you say "we're using Metalink as the mirror list", what do you
>>>> mean?  One annoying item in my setup is the parsing of the HTML
>>>> mirror page - you wouldn't happen to know of a way of retrieving the
>>>> mirror list in XML format?
>>> 
>>> 
>>> You can retrieve a Metalink/XML resource that includes information
>>> about where a file is mirrored, in XML format. I think the correct
>>> way to *discover* this resource is through a 'Link: <...>;
>>> rel=describedby; type="application/metalink4+xml"' header. Can anyone
>>> (Anthony?) confirm that this is the correct way?
>> 
>> yes, Jack.
>> 
>> and that is what I meant, Per, that you could examine the metalink to
>> construct a mirror list.
> 
> Hi Anthony
> 
> I've looked at the metalink xml file from e.g. 
> 
> http://download.opensuse.org/distribution/12.1/repo/oss/boot/x86_64/common.meta4
> 
> There's no problem working with that XML, but it looks like the
> mirror-list will vary slightly depending on something, maybe which file
> is being retrieved:
> 
> http://download.opensuse.org/distribution/12.1/repo/oss/boot/x86_64/common.meta4
> 
> Found 129 mirrors: 0 in the same network prefix, 0 in the same
> autonomous system, 1 handling this country, 59 in the same region, 44
> elsewhere
> 
> http://download.opensuse.org/distribution/12.1/repo/oss/boot/x86_64/yast2-trans-zh_CN.rpm.meta4
> 
> Found 127 mirrors: 0 in the same network prefix, 0 in the same
> autonomous system, 1 handling this country, 59 in the same region, 42
> elsewhere

These little differences could be due to several reasons; one of them could be 
that the respective file is not on all mirrors. (Intentionally, accidentally, 
or some bug, or maybe an incomplete mirror scan or incomplete mirror sync)



> 
> http://download.opensuse.org/distribution.meta4
> 
> Found 35 mirrors: 0 in the same network prefix, 0 in the same autonomous
> system, 1 handling this country, 22 in the same region, 2 elsewhere

http://download.opensuse.org/distribution/ is a wrong URL, that's not a file, 
but a directory. If MirrorBrain shows mirrors for that file, it could be for 
several reasons: there could be a file of that name found on mirrors (but 
although that occurs, it's unlikely the case 35 times). Anyway, MirrorBrain 
doesn't (intent to) store directories, only files, so I would assume it could 
as well be a bug in the mirror scanner that directory names end up in the 
database. I'm surprised that this is the case so often. (Should be quite 
harmless, though.)

Bottom line: don't try to check for existence of directories. Only checking for 
some file is feasible. 

That's by design, because directories could be incomplete, meaning that mirrors 
mirror only some files within a given directory. (In the openSUSE case, for 
instance only the CDs, not the DVDs.)

> http://download.opensuse.org/distribution/12.1.meta4
> 
> Found 29 mirrors: 0 in the same network prefix, 0 in the same autonomous
> system, 1 handling this country, 18 in the same region, 1 elsewhere

Same here.

> I'm wondering if there is one single, easily identified file, that will
> give me the complete list?  I quite like the idea of using e.g.
> http://download.opensuse.org/distribution.meta4, but the list wasn't
> exactly complete.  (should have been complete I think?)

To generate the "grouped" mirror lists on http://mirrors.opensuse.org/, the 
code looks for defined "marker files". Some file that indicates that the mirror 
probably has also the rest of this part of the tree.

> I could no doubt write something to use the mirror-list from multiple
> files to cobble up a complete list, but that's even worse than my
> primitive parsing of the mirrorlists.py output. 

I just thought about this again, and I think it would be reasonable to produce 
a specific list for your kind of request. Just as mirrors resp. their URLs can 
be published on a web page like openSUSE does, the list of potential mirrors 
could be made public just as well. But I tend to prefer a "safe" default that 
does publish this data only on the sysadmin's will, and not by default. There 
could be some cases where requesting the URLs of all mirrors could not be 
wanted, or what do you guys think?

> I would like to just add another output format to mirrorlists.py (in the
> mirrorbrain package), unfortunately (for my own purposes) it would take
> a while before it would make it to e.g. the opensuse download site. 
> Still, my script could test on the availability and default to parsing
> the xhtml output. Hmm, that could work. 

I see. Yes, that would be one possibility.

Thanks,
Peter


_______________________________________________
mirrorbrain mailing list
Archive: http://mirrorbrain.org/archive/mirrorbrain/

Note: To remove yourself from this mailing list, send a mail with the content
        unsubscribe
to the address mirrorbrain-requ...@mirrorbrain.org

Reply via email to