2018-01-24 0:03 GMT+02:00 Eric Wong <e...@80x24.org>:
> Dimid Duchovny <dim...@gmail.com> wrote:
>> > You're right. In my case the flow was: read emails from storage ->
>> > group to threads -> add thread field to storage.
>> > However, I guess it's an edge-case.
>> > On second thought, maybe it'd be better to have a more general solution.
>> > E.g. let the client run an arbitrary callback after adding a child.
>
> OK, I guess you managed to fit skeletons of all your messages in memory?
>
>> > Here's a quick POC:
>> > https://github.com/dimidd/msgthr/commit/1c701717d10879d492d8b55fb8ca2f1c53d7e13f
>
> (truncated output of "git show 1c701717d10879d492d8b55fb8ca2f1c53d7e13f"
>
>>     add callback to Msgthr#add
>>
>>     The motivation is to allow the client to have a custom code executed,
>>         whenever a child is added.
>>
>> --- a/lib/msgthr.rb
>> +++ b/lib/msgthr.rb
>> @@ -166,12 +166,16 @@ class Msgthr
>>        # but do not change existing links or loop
>>        if prev && !cont.parent && !cont.has_descendent(prev)
>>          prev.add_child(cont)
>> +        yield(prev, cont) if block_given?
>>        end
>>        prev = cont
>>      end
>>
>>      # set parent of this message to be the last element in refs
>> -    prev.add_child(cur) if prev
>> +    if prev
>> +      prev.add_child(cur)
>> +      yield(prev, cur) if block_given?
>> +    end
>>    end
>>  end
>
> OK, that seems generic enough and we can probably support it
> long-term, so I'm somewhat inclined to accept it...
>
> However, APIs encouraging/supporting folks to load their entire
> collection(*) of messages (even skeletons) into memory feels
> wrong to me.
>
> Can you come up with a use case where this is useful for
> a subset of messages?
>

Well, in my specific case there weren't many messages, so memory
wasn't an issue.
In general, I think the question of adding the add_child callback is
orthogonal to the
question of using the entire collection or parts of.
I.e. one could use Msgthr as it is, with millions of emails, and one
could use the callback with only a few messages.
Consider this flow:
1. querying the storage backend according to some criteria (e.g. a
time range, a particular sender, etc.)
2. grouping the messages in the response to threads

I'd rather show than tell, so here's a more elaborated example:
https://github.com/dimidd/msgthr/commit/3e38a4910e7a3c17c07f47c4f1b9d556a4a951fd.patch

BTW, note how we only needed one pointer per message and one string
*per thread*,
by using an array with a single element and saving the actual message
only in the top level (the rootset).


>
> (*) I work with millions of emails
>
>> > P.S. I hope you don't mind I uploaded my fork to github.
>
> That's fine, I just add a new remote(*) to my .git/config, fetch
> and show.
>
> What I won't accept about GitHub is having it as a centralized
> and proprietary messaging system which forces participants to
> accept their ToS.  I can't accept that; no single entity
> controls email, so that's what I stick with.
>
>
> (*) added this to my .git/config
> ==> .git/config <==
> [remote "dimidd"]
>         url = https://github.com/dimidd/msgthr
>         fetch = refs/heads/*:refs/remotes/dimidd/*

Reply via email to