Re: merge pick and scan

Eric Gillespie Thu, 21 Apr 2022 13:21:18 -0700

I thought I would sit back and wait to see where this went.
I was surprised and disappointed no one brought up the huge
performance cost of the current implementation.


I guess no one has network-mounted home directories anymore.
I don't.  But for many years that was where my mail lived.
And lots of it!  If you review my messages to this list over the
years, you'll notice a recurring pattern about optimizing away
those slow directory reads.

You see, nmh's dirty secret (ok ok, one of many!) is that the
first thing every command does is read the entire directory.
Yep, the whole thing.

So, the `new` commands that I added carefully avoided the readdir.

But combining scan and pick on NFS is even worse than that
though, as both have to open message files too!

And it gets even worse: first you have to wait for pick to slowly
search ALL THE FILES (within limiting message range you may have
given it if you have any idea and often I did not), and then you
wait for scan to slowly readdir everything, and THEN you finally
get your results.  What I really want (and I doubt I'm alone) is
scan lines as soon as a message is a HIT, so I can interrupt when
the message I'm looking for comes across, without waiting for any
further work.

I offered a patch at some point to have scan read message numbers
through standard input so you could `pick ... | scan -` so you
could skip the second readdir AND get early interrupt.
I'm not sure what happened to that one, but I'm not disappointed
it didn't make it in.  It's really only a half measure.

On a modern filesystem on SSD in 2022 maybe nobody cares anymore.
But over NFS in 2010, this mattered a lot.

I don't see how we can argue that "the UNIX philosophy" means
every command has to repeat the same expensive work and also
they're not allowed to share code.  Well they already share
plenty of code... just not that code!  :)

I think the philosophy case falls down in other ways as well.
pick does three things:
1. resolve user query to message numbers
2. by default, print the message numbers without formatting
3. optionally, store in sequence

scan already implements a subset of #1 (as do all commands
accepting message specifications).  #2 is pick duplicating scan's
job (scan -format '%(msg)').  #3 is pick duplicating mark's job.

I think I still have an old patch lying around somewhere that
teaches pick to scan when `-scan` option is given.

I definitely plan to resurrect that patch soon because... Guess
another case where having scan repeat pick's expensive work comes
into play.  Yep, it's over there on that imap-prototype branch.

I'll be sure to bring numbers into that discussion when the time comes :)

Thanks!

Re: merge pick and scan

Reply via email to