Dell Customer Communication -- Matt Domsch Senior Distinguished Engineer & Executive Director Dell | Software Group, Office of the CTO
-----Original Message----- From: Pierre-Yves Chibon [mailto:[email protected]] Sent: Monday, June 29, 2015 4:08 AM To: Domsch, Matt Cc: [email protected] Subject: Re: Thoughts and question about MM2's UMDL script On Fri, Jun 26, 2015 at 06:00:18PM +0000, [email protected] wrote: > * Readable status of directories > The Directory table has a 'readable' property, none of our directories > is not readable. > > Question is: what is the use-case for this boolean? > > == MD == Pre-bitflip content, which UMDL can see but the normal public can't > yet. Are you no longer bitflipping? Then it doesn't matter. Ok, I see the use-case in the crawler, but in the UMDL, how did it work? The UMDL would not be allowed to read a given folder? == MD == UMDL can read it, but the crawler can't. UMDL sets readable=False; crawler then doesn't delete the directory (or care if it can't read it) because it doesn't expect it to be readable. Otherwise, when readable=True but a given mirror doesn't have that content, crawler marks that host_category_directory for deletion. > I am under the impression currently that dropping un-necessary > directories would save DB space (the directories being then linked in > the host_category_dir table listing for each host, in each category > which dir are present) as well as crawling time (both in the UMDL and in the > crawler). > > > == MD == You need non-repo directories for ISOs at least; there was a time > when we were able to mirror the entire Fedora static web content too; able > only because MM tracked all directories, not just repository directories. MM1 > also tried to be a "generic" mirror manager, not just a Fedora-specific > mirror manager, so I intentionally tracked everything, not just Yum repos. Idea: what if we were tracking only the folders that have files in them, so for example http://dl.fedoraproject.org/pub/epel/5/ would not end-up in the database. In addition, we could add a sort of blacklist to avoid storing http://dl.fedoraproject.org/pub/ just due to the presence of the DIRECTORY_SIZES.txt file This would reduce the number of directories we store for the Atomic tree. == MD == I didn't optimize for a few non-file-containing directories. You're welcome to if you see a need. But it's saving just a few entries out of hundreds/thousands. > * Non-directory based support in UDML. > > So the UMDL script currently supports three ways of crawling the tree: > * file > * rsync > * directory > > We, in Fedora, are only using the last one. I believe the `rsync` mode > was added to support Ubuntu and the file mode is basically a > simplified version of the directory mode, but that we do not use at at the > moment. > > I would like to propose that we drop support for rsync. I feel that it > may be simpler and easier to create an UMDL and a crawler for each > distro that would like to use MirrorManager than maintaining a > one-script-fits-all UMDL that is in fact tested for only one of the scenario. > That being said, if we ever have interest from Ubuntu, CentOS or any > other communities, we should definitively look into making the UMDL > and crawler as re-usable as possible for them, but keeping the > distro-specific bits separated. > > > == [file] was used early on for dev and testing. It's not interesting. > [rsync] would be used when you don't have access to a master mirror (or very > close replica). Perhaps the rpmfusion setup still needs this. I would have > for testing Ubuntu, certainly. It shouldn't be needed for production when the > content being mirrored out is managed by the same people operating > mirrormanager, as is the Fedora case. Apparently RPMFusion does need this, so it needs to stay, the question becoming: Should we split the different UMDL types into different scripts? The idea being that allow easier optimization then. (Note: I'm having this idea now but since I did not looked at what/how we could optimize, it may end-up remaining in the same file) == MD == the parsing routing is pretty short; not worth a separate executable for.
_______________________________________________ infrastructure mailing list [email protected] https://admin.fedoraproject.org/mailman/listinfo/infrastructure
