Re: [jmap-discuss] Perform actions on all messages of a Mailbox

Bron Gondwana Thu, 20 Oct 2016 16:15:30 -0700

Sorry about the delay in replying to this.

On Tue, 18 Oct 2016, at 21:16, Matthieu Baechler wrote:
> Hi Neil,
>
> Le mardi 18 octobre 2016 00:18:24 UTC+2, Neil Jenkins a écrit :
>> I don't think this should be added to the JMAP spec. One of the
>> concerns raised by a large mailbox provider we talked to was to make
>> sure a client could be rate limited in a reasonable manner, so it
>> can't overload the server. (We've also been careful the other way to
>> try to ensure the client can control exactly how much data it
>> requests from the server in one go.) Adding a command like this means
>> the client could ask the server to do something potentially very
>> expensive, depending on backend implementation.
>
> Do people expect to use JMAP as the only protocol to access mailboxes
> ? because in IMAP, such very expensive methods are very common
> (expunge, modification with a very broad uid range, etc).


Yes, they are.  They're also not undoable, and risky.  We have a self
service "restore from Backup" tool at FastMail because "I
accidentally deleted a ton of messages I didn't mean to" was a very
common support request.

(that and "I store all my important email in the Trash folder because
I'm insane, and I just hooked up an iPhone which wiped it", *sigh*)

>>
>>
>> Now the server could reject it if it's over a certain number of
>> messages resulting in the query, but the exact limit will be server
>> dependent and when it happens the client has to fallback to a
>> different approach. Having two different implementations in the
>> client is likely to be less tested and buggier.
>
>>
>>
>> In general, JMAP prefers the philosophy of explicitly telling the
>> server what changes to make; this is often more efficient anyway if
>> you're keeping the client cache in sync.
>
> You can do that with IfInState easily, you already know which messages
> will be changed client-side because you actually wrote the query.
>
>>
>>
>> The approach we take to this problem (and I would recommend) is to
>> fetch the list of ids up front (in pages if necessary), then ask the
>> server to make the changes to them in batches (say 100 to 500 at a
>> time), waiting for the previous request to finish before making the
>> next one.
>
> It doesn't look like a great API to me. Managing deletion with client-
> side batch for performance purpose doesn't sound good.
> I think a good implementation will consume more ressources to handle
> such large queries than to do it server-side based on a query.

I thought that too at first, and I initially made the same arguments.
Especially because we already had the ability to delete mailboxes and
hence operate on the messages inside them.

>> The ids (should be) reasonably small and quick to fetch even for
>> large folders. Fetching them up front ensures you don't process
>> anything that arrives during the operation.
>
> IfInState already covers this case, don't you think ?

It does, but the next getMessageUpdates will have to get a response for
all those messages anyway, because you don't know if the client had
cached anything for the IDs.

>
>> By doing it a batch at a time, you can make sure you won't overload
>> the server (and make sure the server will accept the request), and
>> also more easily show a progress bar to the user (because the user is
>> probably locked on the server while the changes are being made), or
>> even interleave other requests to keep the client responsive while a
>> large operation is happening in the background.
>
> It would be easily solved with an "async" capability on requests.
> We already have Event Source for receiving async result. What do
> you think ?

Thanks for raising this topic, because we did discuss it in a lot of
detail, and we actually decided to go entirely the opposite direction!
Instead of deleting a mailbox causing the messages to be moved to the
"inbox" role if there were messages in the mailbox, we changed it so
that you can't delete a mailbox which contains messages.  You need to
explicitly delete them or move them out first.

There are some strong guiding principles in JMAP, and one of them is
that messages are precious and actions should be explicit.

Our own Cyrus IMAPd server has algorithms built on the assumption that
the biggest single mailbox will contain one million emails.  That's
pretty big.  We have around 20 users with more than that many emails
total across all their mailboxes (I know because we have a 32 bit file
size issue with an internal cache file when you get to about 3 million
messages in a single mailbox).

So looking at a most extreme case of deleting a million emails at the
same time, you're looking at one megabyte per byte of message.  A
reasonable ID size is 64 bits, which is 16 hexadecimal characters.
Add in commas and quotes, you're looking at roughly 20 bytes per id.
Multiply that by a million, that's 20 megabytes of IDs to download
and process.

Yes, it's a lot of data.  But 1 million emails is nearly 2 years' worth
of getting one email per minute, all day, every day, and not deleting
anything right up until you suddenly decide to wipe all million emails.

A more realistic number is 10,000 emails in a mailbox which is being
wiped.  That's a week of one email per minute, which is about twice the
rate that I get (and I get a ton of notify email and mailing lists).

10,000 IDs is 200kb of data.  Half the webpages I go to are about that
size.  And that's once per week on a really busy account, where you're
downloading tons more data than that just to keep up with reading a
fraction of your incoming email.

So I'm not convinced that this "inefficiency" is actually a problem in
practice.  The code to fetch the list of IDs and pass it back to the
server isn't complex.  As Neil said, batching in groups of say 1024
messages allows you to display a progress bar as you delete the
messages.  Implementing an "empty Trash" as a callback which gets all
the IDs in the Trash folder and issues a delete for them all isn't much
client side code, and it means that the protocol doesn't have the
discontinuity.

It would be easy to implement an extension for "emptyMailbox" (which I
think is the use case we're really looking for here, rather than
arbitrary filter), but I've been convinced upon further examination that
it shouldn't be in the base protocol.

Regards,

Bron.

--
  Bron Gondwana
  br...@fastmail.fm

-- 
You received this message because you are subscribed to the Google Groups 
"JMAP" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jmap-discuss+unsubscr...@googlegroups.com.
To post to this group, send email to jmap-discuss@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jmap-discuss/1477005317.3029455.762549609.1CEB0BA7%40webmail.messagingengine.com.
For more options, visit https://groups.google.com/d/optout.

Re: [jmap-discuss] Perform actions on all messages of a Mailbox

Reply via email to