Re: [gentoo-dev] Re: New category proposal

Kevin F. Quinn Thu, 12 May 2005 04:35:49 -0700

Brian Harring wrote:
> > The layout on disk and the semantics of categories do not need to be > > 
> > related.
> Yes and no.  You're assuming that people don't use the layout on 
> disk for digging around without calling portage.  Personally, I do.

Sometimes I do the same; but other times I find the layout a barrier.  Many's 
the time I've done:

$ ls -d /usr/portage/*/<package name pattern>

to find a package, for example - that indicates the categories are actually 
hindering searches in this case.  Incidentally it also treats the tree as if it 
were a flat namespace.

However, ideally I wouldn't be searching the tree directly like that at all, 
I'd be searching metadata based on various criteria.  Indeed package names as 
they stand are frequently uninformative; if you decide you need something for a 
particular function, you can have a look in what you think may be the relevant 
categories, only to find a list of mostly meaningless package names.  Then you 
start grepping the DESCRIPTIONs, and so on finally trying equery.  This whole 
process is rather unsatisfactory, and in my experience often fruitless.  Many 
times I've gone the other way; google for something to find candidate package 
names, then 'ls -d /usr/portage/*/<name>' to see if some kind soul has already 
added an ebuild to the tree.

For me, the whole point of a flat namespace is to _remove_ categories from the 
atom.  Obviously this has far-reaching disruptive consequences as you describe, 
and in practice is not workable in the short to medium term at least.

I'd like to be able to ask questions like, "what app-text packages exist for 
<some function>?".  At the moment, listing app-text, grepping 
app-text/*/*ebuild may get somewhere, but what about packages placed in 
different categories for reasons like name clash, other functionality and so on?

Cieran McCreesh wrote:
> So we end up not using upstream naming, leading to major hassle with
> tarballs, major user confusion and inconsistent naming (why are some vim
> things vim- and others not?). Bad! Now that portage *tells* you when you
> need to be more specific, there's no problem with name matches.

I agree maintaing upstream naming is very important.  However obviously 
upstream names can and do clash.  That raises the question of how such clashes 
should be resolved.  Categories are a rather arbitrary way of doing that - it's 
quite possible that a clash could occur between two packages that naturally 
fall into the same category - in the current system that means one of the 
packages gets dumped in a second-choice category.

Talking atoms, one could handle clashes by differentiating occurrences with an 
extension to the name.  To take the sudo example, sudo could be the normal 
sudo, sudo:vim (or perhaps sudo__vim to be acceptable to more filesystems) 
could be the vim extension sudo.

Brian Harring wrote:
> Re-asserting that the fs layout *does* matter, how is that more intuitive > 
> when trying 
> to track down the ebuild for dev-util/diffball ?  How many directories > deep 
> would I have to go before I reached the ebuild?

$ ls -d /usr/portage/*/<name pattern>

becomes

$ find /usr/portage -type d -name <name pattern> -print

and for quick&dirty things like

$ grep -l <pattern> /usr/portage/*/<name pattern>/*ebuild

instead do:

$ find /usr/portage -type d -name <name pattern> \
    -exec grep -l <pattern> \{\}/*ebuild \;

or somesuch.

An interesting possibility is that the portage mirrors and clients can have 
different layouts depending what is most suitable.  Those with reiserfs could 
sensibly choose the very wide layout.  Others on ext2 could choose a s/u/sudo 
approach to avoid problems with very wide directories.  Obviously this means 
modifying the sync process somewhat deal with this, but it's quite possible, in 
a scalable efficient manner.

Brian Harring wrote:
> > The key here is to separate the category (metadata) and filesystem  [snip]
> This also locks out several possibilities, like relying on dir structure > to 
> limit the searches.
> You force category classification to be metadata, you need an additional > db 
> to do searching, 
> and basic atom lookup.  That's 19000+ keys in a db.  No db, and you force > a 
> tree wide search, which _will_ be as fast as emerge -S is.

If you retain category in the atom; for me there's no point flattening the 
namespace without removing the category completely from the atom.

Where at the moment you perhaps want to do:

$ grep <pattern> /usr/portage/app-text/*/*ebuild

then yes, an additional db of some kind is necessary, or perhaps a more 
efficient way of searching the metadata.xml files.  However I disagree with the 
19000+ keys.  Portage could for example maintain a simple category->package 
name mapping - only needs to be updated when packages are added/removed from 
the tree or metadata is changed, and can be trivial.  For example, it could be 
a simple shell script with entries like:

PC_<category>=<name> <name> <name>

at which point you only need to do:

$ source <category db>
$ for pkg in ${PC_<category>}; do ... ; done

Brian Harring wrote:
> cpvs can't conflict, pure and simple under the current 
> layout, which is 
> enforce by the single category/fs layout.

cpvs can't conflict because when a package name already exists in a category, a 
conflicting package name has to go into a different category even if it's not 
the most natural category for the package.  What you've done there, is assert a 
rule (cpvs are unique) thus 

Brian Harring wrote:
> What are we gaining?  Ability to find a package under two categories?

That, and stability of package location.  Moving packages around the tree is 
disruptive, not just to ebuilds that reference them but also cause unnecessary 
mirror activity.

For me, categories are a search criteria.  Making them part of the tree makes 
it difficult to revise those criteria.

Brian Harring wrote:
> > The benefits include
> > 1) no more "moving packages around the tree"
> cpv conflict.  You aren't moving the fs position of it, but it still 
> requires walking the tree and updating all atom's that reference the old > 
> position.
The point is that *DEPEND would not mention the category.

Brian Harring wrote:
> > 2) categories can be added to a package in the most natural way
> Elaborate.

The idea is that packages can naturally belong in more than one category.  
Thinking of categories more like search keywords, if you like.  A package that 
processes text would match app-text, but perhaps it's also a financial tool 
which would therefore also match app-finance.

Another good example of the usefulness of more than one category are the sys-* 
categories, where all the packages in sys-* categories naturally fall both into 
their sys- category but also the relevant non-sys category.  Take GCC; 
currently in sys-devel/gcc, not in dev-lang/gcc which is where a naive user 
would look for it.  With multiple category markings, it could be in both.

Brian Harring wrote:
> > 3) overlays can be tidier
> Eh?

This is a result of the dynamic s/u/sudo approach where the directory depth is 
arbitrary.  In the overlay you could drop the s/u/ bit.  I'd guess most 
overlays modify relatively few packages; I know I have a bunch of categories in 
my overlay that only contain one package.  Given that portage would take a 
top-down search approach to locate the package (i.e. try sudo, then s/sudo, 
then s/u/sudo ... first in overlay then in the mirror) this works transparently.

Brian Harring wrote:
> What do we gain from a flat namespace?

Eliminating categories from package names

> Right now, I can infer an atom out of a DEPEND string's purpose to 
> some degree, based upon it's category.

You could use this argument for appending the description to the atom, but 
noone would suggest such a thing seriously.  What you're justifying, is 
building metadata into the package name.

> To head off the "well you 
> don't need to know the category, you should know the packages 
> intentions if you're modifying the ebuild", that dodges the point; via > the 
> category portion of an atom, I can infer at least -intention- of a > package.

To be more accurate, you can infer an aspect of the intention of a package that 
the original committer felt was most important whilst avoiding clashes. That's 
the point - by forcing a package to be a member of exactly one category, the 
implications from category membership are limited.

I'm the first to admit that doing the changes to the fs layout I've talked 
about are hugely disruptive, and as such are not sensible, most especially in 
the short to medium term.  This discussion however does serve to understand the 
problem properly before making any changes.  I think adding categories to 
metadata.xml, removing the few clashes (but otherwise leaving the fs layout as 
it is), and coming up with an efficient search tool (e.g. getting portage to 
maintain something like the script I mentioned above, or creating a widget to 
build it from the metadata.xml files) could eliminate the primary problem of 
moving packages around, and the arguments like should a package be in dev-cpp 
or dev-libs.  The rule could then be that once a package is in a physical 
category in the tree then it will not physically move, no matter what.  *DEPEND 
would continue to use the physical category, at least in the short term - it 
could ultimately drop the category if that becomes sensible.  !
 Changing the few existing clashing names could be undertaken gradually (e.g. 
appending :<differentiator> as describe above), to allow clashing names to 
belong to the same category.

This is quite benign and relatively painless.  Ultimately you have a flat 
namespace, packages will no longer move inside the fs tree, the old q&d ls/grep 
tricks to try to find suitable packages would work as well as they do now, 
arguments about which category to place a package disappear, searches using 
category can become more intuitive, different packages that have the same 
upstream name can be members of the same category.

Kev.

-- 
[email protected] mailing list

Re: [gentoo-dev] Re: New category proposal

Reply via email to