Thanks for the explanation, some thoughts on it follow...

On Sun, 9 Mar 2008 18:28:54 +1100
Craig Sanders <[EMAIL PROTECTED]> wrote:

> > Granted some users require or prefer the '-man' switch output ('#
> > manpagename'), but a duplicate line seems useless.  A review of the
> > three related bugs, (#152207, #289351, and #361125) shows that since
> > 2002, four users ("Long", Metzler, Jacobson, & Costa) considered the
> > duplicate lines a bug.  Were we all mistaken, and if so, how?
> 
> 'dlocate -man' lists the man pages in a package.
>
> if and when there are duplicates, it is *because* there are duplicate
> man pages (for different languages) in the package. dlocate doesn't
> just make them up, it does it's told: list the man pages in the
> package.

OK, the job of '-man' is to list all man pages in any language.
Does 'dlocate -man' currently list them all?  No, or only partially.
For multiple translations of a single man page, 'dlocate -man' lists
duplicate occurrences of the name of that man page.

One could argue those duplicates are foreign homonyms sharing the same
spelling.  But Unix utils can't parse homonyms or polysemy, (i.e. "Plant
pot plant in plant pot."), Unix is built upon context-free grammars, so
concepts like 'homonym' aren't relevant when there are no programs to
use those too complex homonyms.  (The duplicates have at least one use,
more on which below...)

Is it possible to list all the man pages in a package using only
the '# pagename' format?  No, not without distinguishing information
about languages.

> for those who don't want to use the new -lsman option, there is and
> always has been:
> 
>     dlocate -man PACKAGE | sort -u
> 
> i don't consider it a bug because it's trivial for a user to use
> standard tools to eliminate excess information, whereas it is
> impossible for a user to re-construct information that has been
> thrown away. in other words, throwing away information would be a far
> more serious bug, especially when the user can easily throw it away
> themselves.

Agreed that throwing away information is more serious; but not all data
is information and it's not obvious what useful data would be lost.  

As far as I can tell the only meaning of the duplicates is that when
counted they show how many translations of a man page exist.  To make
practical use of the count also requires external utilities, e.g.
piping to 'sort | uniq -c'.

Is that method of counting is used often, and does any Debian package
use it?  I don't know.

Compare '-man' to the newer '-lsman' option, (thanks for adding it),
which provides distinguishing information.  Some tasks:

Count translations of 'apt-cdrom':

        dlocate -lsman apt | grep -c apt-cdrom.8

        dlocate -man   apt | grep -c "8 apt-cdrom" 

Count translations in general (sorted by # of translations):

        dlocate -lsman apt | sed 's#.*/##' | sort | uniq -c | sort -g

        dlocate -man   apt |                 sort | uniq -c | sort -g

('-lsman' has one more pipe there  -- an uncommon task tho'.)

Count of how many translations in Spanish:

        dlocate -lsman apt | grep -c '/es/'

        (Impossible without fetching information not in '# pagename'.)

> i can see that there are multiple valid but conflicting viewpoints on
> this issue, so i'm going to err in one way or another no matter what i
> do. i prefer to err based on the assumption that users are competent
> and able to use standard unix tools and basic concepts like pipes. 

Indecisive error is not inevitable if you know what you want.  Your
spec is either met or it's not.

On standard Unix tools:  the power of those tools comes from design
simplicity, (parsimony & elegance); the tools might not be simple to
create, but they should be simple to use.

Possible fixes:

        1) do nothing, nothing's wrong except users.

        2) omit the dupes.

        3) ...and add distinguishing language suffix as somebody 
           suggested, but since it would break the intended usage of 
           'man $(dlocate -man apt)', (sans filters), perhaps 
           a new option could add language suffixes.

Related (thoroughly fixed) Debian bug:

        #259338: man-db: with 2 keywords, output of 'apropos' can be redundant

HTH...



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to