On Mon, May 09, 2016 at 08:57:55PM -0400, Damien Riegel wrote:
> On Mon, May 09, 2016 at 06:42:33PM -0500, Derek Martin wrote:
> > On Sat, May 07, 2016 at 04:14:36AM -0400, Damien Riegel wrote:
> > OK, I get it now.  I think this could work OK if the parent did a
> > single stat, and perhaps did an opendir if the object was a directory,
> > and then passed copies/pointers/whatever of those objects into the
> > registered probe functions (pointers for sure).  There's no need to do
> > multiple stats.  Then at least the only efficiency loss is the extra
> > function calls.  If dir, the probe function should assume it needs to
> > do rewinddir().
>
> Thanks for your input Derek. My code was based on the assumption that
> mx_get_magic is idempotent, but maybe it is not true. 

Now that I've actually looked at the code... ;-)

I think it is, in the sense that you mean.  That's not the issue
though, if I understand correctly.  I think your probe design, keeping
the knowledge about a mailbox implementation separate from the rest of
mutt, is a good idea, and I didn't mean to discourage that.  I'm just
pointing out that as implemented by your current set of patches, it
has a performance concern, but it's one that can be overcome.  Maybe
I'm not making myself clear, so I'll try again. 

The issue is that Mutt is going to call your registered probe
functions one at a time, until one of them returns that it matches the
mailbox type.  Currently in all cases this is a function that wraps
the mx_get_magic() function, which always checks every possible path,
until it finds one that matches.

Kevin was (I think) pointing out that this is inefficient, which it
is.  You'll stat each relevant path up to n times, where n is the
number of mail box types you've registered. Currently the
mx_get_magic() function and friends stat roughly 10 paths, so you'll
do up to roughly n*10 stat() calls.  As you say, a normal stat takes
negligible time and the user will never notice; and probably the data
will end up served from some cache somewhere in the filesystem layer
an/or hardware.  

However, for example, if the mail store is on an NFS server with a
slow link, in a worst-case scenario may add up to being very noticable
to the user (NFS caching may or may not help hide some of this).  If
there were 10 mailbox types and your latency to the NFS server was
100ms, this might end up taking roughly 2 * 100ms * 10 mailboxes * 10
paths = 20,000ms, or 20s.   Yuck.

What I'm suggesting is that if the function that runs through the
probe functions does the work in advance, and caches as much of it as
makes sense, you can get most of that back.  I just took a look at the
mx_get_magic() function and those that it calls...  The lot of that
basically just stats a bunch of paths--there aren't even any opendir()
calls.  SO:

 - register a TYPE of folder, where the type is either local file stor
   or not (or something of the sort)

 - register a list of paths that your mailbox driver needs to stat to
   identify mail stores of its type (can be NULL)

 - FIRST the driver function should check for non-local file stores
   (like IMAP and POP), since those require no stat() calls (and I
   believe should mostly just be a string compare, specifying the
   protocol).

 - THEN have the driver function pre-fetch the stat structs and cache
   them somehow (linked list, hash table, whatever), and pass that
   structure to the probe functions

Now you can break up mx_get_magic() into its relative peices, and make
each piece the probe function for that specific mailbox type.  All
must take a pointer to your cache structure, from which they'll get
the struct stat structures they'll use to make their decision.  Then
they'll just pull out the stat structs they need.

You could do this by having two different probe functions in your
struct--one for local file store, and another for non-local.  This
would be sufficient to distinguish the type, though for neatness you
may want to have a field that specifies it explicitly.

For each mail store type you would register the correct one and leave
the other NULL.  Then your driver function would call all of the probe
functions for non-local, which need not take your stat cache
structure.  This way you only need to generate the stats if at least
one of them is needed.  Alternatively, have just one, use the type
field, and leave the stat struct cache argument NULL for the non-local
types (and still wait to generate the cache until they all fail).

Does this make sense?

Note also that you need not necessarily implement something like this
for the first pass...  It's of course up to Kevin how and when (and
if) he wants to take the various patches.

-- 
Derek D. Martin    http://www.pizzashack.org/   GPG Key ID: 0xDFBEAD02
-=-=-=-=-
This message is posted from an invalid address.  Replying to it will result in
undeliverable mail due to spam prevention.  Sorry for the inconvenience.

-- 
Derek D. Martin    http://www.pizzashack.org/   GPG Key ID: 0xDFBEAD02
-=-=-=-=-
This message is posted from an invalid address.  Replying to it will result in
undeliverable mail due to spam prevention.  Sorry for the inconvenience.

-- 
Derek D. Martin    http://www.pizzashack.org/   GPG Key ID: 0xDFBEAD02
-=-=-=-=-
This message is posted from an invalid address.  Replying to it will result in
undeliverable mail due to spam prevention.  Sorry for the inconvenience.

Attachment: pgpcm2SSBvnjf.pgp
Description: PGP signature

Reply via email to