Dell Customer Communication
--
Matt Domsch
Senior Distinguished Engineer & Executive Director
Dell | Software Group, Office of the CTO


-----Original Message-----
From: Pierre-Yves Chibon [mailto:[email protected]]
Sent: Monday, June 29, 2015 4:08 AM
To: Domsch, Matt
Cc: [email protected]
Subject: Re: Thoughts and question about MM2's UMDL script

On Fri, Jun 26, 2015 at 06:00:18PM +0000, [email protected] wrote:
> * Readable status of directories
> The Directory table has a 'readable' property, none of our directories
> is not readable.
>
> Question is: what is the use-case for this boolean?
>
> == MD == Pre-bitflip content, which UMDL can see but the normal public can't 
> yet. Are you no longer bitflipping? Then it doesn't matter.

Ok, I see the use-case in the crawler, but in the UMDL, how did it work?
The UMDL would not be allowed to read a given folder?

== MD == UMDL can read it, but the crawler can't.  UMDL sets readable=False; 
crawler then doesn't delete the directory (or care if it can't read it) because 
it doesn't expect it to be readable.  Otherwise, when readable=True but a given 
mirror doesn't have that content, crawler marks that host_category_directory 
for deletion.


> I am under the impression currently that dropping un-necessary
> directories would save DB space (the directories being then linked in
> the host_category_dir table listing for each host, in each category
> which dir are present) as well as crawling time (both in the UMDL and in the 
> crawler).
>
>
> == MD == You need non-repo directories for ISOs at least; there was a time 
> when we were able to mirror the entire Fedora static web content too; able 
> only because MM tracked all directories, not just repository directories. MM1 
> also tried to be a "generic" mirror manager, not just a Fedora-specific 
> mirror manager, so I intentionally tracked everything, not just Yum repos.

Idea: what if we were tracking only the folders that have files in them, so for 
example http://dl.fedoraproject.org/pub/epel/5/ would not end-up in the 
database.

In addition, we could add a sort of blacklist to avoid storing 
http://dl.fedoraproject.org/pub/ just due to the presence of the 
DIRECTORY_SIZES.txt file

This would reduce the number of directories we store for the Atomic tree.

== MD == I didn't optimize for a few non-file-containing directories.  You're 
welcome to if you see a need.  But it's saving just a few entries out of 
hundreds/thousands.

> * Non-directory based support in UDML.
>
> So the UMDL script currently supports three ways of crawling the tree:
> * file
> * rsync
> * directory
>
> We, in Fedora, are only using the last one. I believe the `rsync` mode
> was added to support Ubuntu and the file mode is basically a
> simplified version of the directory mode, but that we do not use at at the 
> moment.
>
> I would like to propose that we drop support for rsync. I feel that it
> may be simpler and easier to create an UMDL and a crawler for each
> distro that would like to use MirrorManager than maintaining a
> one-script-fits-all UMDL that is in fact tested for only one of the scenario.
> That being said, if we ever have interest from Ubuntu, CentOS or any
> other communities, we should definitively look into making the UMDL
> and crawler as re-usable as possible for them, but keeping the 
> distro-specific bits separated.
>
>
> == [file] was used early on for dev and testing. It's not interesting. 
> [rsync] would be used when you don't have access to a master mirror (or very 
> close replica). Perhaps the rpmfusion setup still needs this. I would have 
> for testing Ubuntu, certainly. It shouldn't be needed for production when the 
> content being mirrored out is managed by the same people operating 
> mirrormanager, as is the Fedora case.

Apparently RPMFusion does need this, so it needs to stay, the question becoming:
Should we split the different UMDL types into different scripts?
The idea being that allow easier optimization then.
(Note: I'm having this idea now but since I did not looked at what/how we could 
optimize, it may end-up remaining in the same file)

== MD == the parsing routing is pretty short; not worth a separate executable 
for.

_______________________________________________
infrastructure mailing list
[email protected]
https://admin.fedoraproject.org/mailman/listinfo/infrastructure

Reply via email to