Re: [E-devel] RFC: EOID + Threads + TLS proposal

Gustavo Sverzut Barbieri Sun, 04 Sep 2016 16:27:35 -0700

>> given that we do not propose or use lots of multi-thread, particularly
>> multiple threads concurring for the same object, this looks the best
>> path forward.
>
> yeah.. BUT we get lots of noise from people wanting to use multiple threads
> with efl. we have ways (eocre_main_loop_async_call/sync_call,
> ecore_main_loop_begin/end, ecore_thread with "result" cb's in mainloop etc.).
>
> the way their mindset works, that i have gathered, is all efl funcs be it
> widgets, timers, whatever can be used anywhere in any thread and all will be
> magically fine. that's night impossible especially with a mainloop and ui.
> especially ui. ui would end up half-rendering a ui with the widgets half
> configured if a thread was still setting up something when a render pass came
> in. thus the "render when idle" method. the begin/end method above allows you
> to basically sync with mainloop from thread, take a lock, do your stuff and 
> get
> out again. to avoid glitches like above we'd need this anyway, so we have it.
>
> we should be using more threads - though mostly inside efl to make an op async
> or faster so blocking time is shorter, but we still need to support people. we
> cannot support the way they think threads should be done, but with the
> begin/end we get as close as it actually sanely possible. so while we don't 
> use
> a lot of threads, others do. that is one of the design aspecs for efl
> interfaces making the mainloop and object. the idea is you could have 
> different
> threads with different loops. there will always be a MAIN loop (in the thread
> with main() called), BUT why not have other loops in threads that can
> communicate to/from the main loop? this would make these people above much
> happier. also this "split eo objects" means they could use non-ui objects
> pretty easily in these threads - eg efl.net and thus a local EOID table would
> make a lot of sense.


I get all of this and this is because I'm saying we should rather
focus on the communication part instead of the MT aspect of objects.

My perspective on this changed a bit after taking a look at Golang.
Most people don't get main loops and being called back, most try to
solve this by spawning threads... but it doesn't solve, just adds to
the problem.

After lots of thinking I believe most people struggle to partition
their problem into multiple functions and segment their work. One of
the reasons is that most start programming in procedural, most
algorithms are procedural, etc.

There is no "context", no partitioning and no preemption... just do
what you have to do, like a busy wait reading from db, search, sort,
paint... When you add threads, you solve some of these issues, but get
the damn preemption to add to your nightmares.

Go solved this by incorporating channels and select into their
language. It's super simple to send information and it's
batch-friendly from the POV of users programming it.

So my suggestion is to focus on communication and maybe offer a way to
convert back from a main loop to a "select" for non-main threads. In a
worker thread you could do something like in go:

     select {
        case data :=  <- data_source1:
            do_something_1();
        case data :=  <- data_source2:
            do_something_2();
    }

In C we could do:

     ew = efl_event_waiter_new();
     efl_event_waiter_add(ew, data_source1, DATA_SOURCE_EVENT_1, some_id1);
     efl_event_waiter_add(ew, data_source2, DATA_SOURCE_EVENT_2, some_id2);
     switch (efl_event_waiter_wait(ew, &data)) {
         case some_id1:
            do_something_1();
            break;
         case some_id2:
            do_something_1();
            break;
    }
    efl_event_waiter_del(ew);

it would essentially run an internal main loop on the worker thread,
connect Eo events on each source (in the main thread) and when they
happen, return the Efl_Event in "&data" (options on this below).

However this is easy for these people to map to how they learnt to
program. There is no "void *", no callbacks. It's an alternative to
OOP where to solve this they force you to inherit from a class (that
its internal data serves as "void *context") and instead of callbacks
they call your method. (which is something good when the language
helps you, but in C it's a PITA, maybe we can also offer the OOP as an
option by using override on a bare object?).

Note I'm not suggesting people use this to write the whole EFL itself,
but could be a way to map the communication for people used to batch
programming and let them do these stuff into secondary threads.

Options to implement the above:

 - efl_event_waiter_add() finds the thread owning the object, adds an
event callback with a proxy function. The function would pause that
thread and wakeup the secondary thread (efl_event_waiter_wait()),
until the next efl_event_waiter_wait() or efl_event_waiter_del().

 - explicitly create a "channel" for an object on its owning thread.
This channel adds an event callback, when it's activated it will
report to all listeners. Then on the secondary thread you would
receive pre-created channel, no need to find owner thread. Everything
else would be the same, pause object thread and wakeup the worker.

 - one of the above, but instead of pause, serialize the event-info
and send to thread. With Eolian we can generate these
serializers/duplicate, maybe add as callback to the event description
structure.

 - in addition let a channel to be created explicitly to send random
information without emitting an event callback. Like create a channel
of integers, when you want to send and int just channel_send(ch,
&myint).

 - timeouts would come as channels as well

Remember this is to help those not used to callbacks or segment their
code, they can write their stuff in a busy-loop or even do as a batch
programming (instead of putting it inside a while(), wait on different
channels at different time, then return).

Not solved in this description is how to actuate on the objects of the
main thread. Suppose a GUI, the main thread have some buttons (events
would be handled as channels per above), but then you want to change a
label.

Given raster's TLS solution, objects would be invisible/unreachable to
the thread. Safe, but not that usable. People would have to send back
to the other thread what to do, this is cumbersome.

If we go with a "@synchronized" approach, we must serialize the event
callbacks if we dispatch callbacks as locked -- otherwise deadlocks --
or we must unlock before calling back, which is cumbersome and brings
problems.

Another option is to extend the channel to be an object proxy. When
you call a method on the proxy, it would cooperate with the channel
that did the "pause" of the object thread, run the method there and
return values. If the main thread is not being paused (info was
serialized), then we would ecore_main_loop_begin() - call - end().

Anyway, I do not want to hijack the thread purpose. My point is that
users of threads and some patterns we do not use have a reason, I
truly believe the reason is the one stated above. If you agree, then
we should help them to get their work done the way they like... and
this would impact how we're optimizing Eo. But picture this in your
mind, most people want:

     thread-1: // main loop + gui, invisible to the user

     thread-2: // check what to do from UI
         while (running) {
            wait one of {
              timeout 1s: update_clock();

              start button pressed:
                  do_some_sequential_action_in_thread3();
                  ui_progress_pulse_start();

              stop pressed:
                  cancel_some_sequential_action_in_thread3();
                  ui_progress_pulse_stop();
          }
         }

     thread-3: // something that would be tedious with cbs, simple in batch
         header = read_from_net(2) // blocking
         if (header.bla)
              header_bla = read_from_net(8)
         if (header.ble)
              header_ble = read_from_net(12)

         ui_label_set(header.blo)

currently we're the opposite of that, since we're modeled after the
single thread main loop pattern. We request people to connect 3
callbacks (one for timer, 2 for buttons) + recreate the
algorithm/parser to get data from network to get unknown bytes of data
from the net. Most people don't know how to do that, and it's a real
PITA to do it right :-)



>> > 4. We make it a clear rule that threads cannot access objects outside of
>> > those that they created UNLESS:
>> > 4.1 An object is explicitly SENT from one thread to another (we can do this
>> > later but if this is done, the object must have a refcount of 1 only, no
>> > parent, no children, no objects referenced in keys, weak refts to/from this
>> > object etc.). We can release the EOID entry in thread 1, but not call
>> > destructor and free object memory, send the POINTER to thread 2, and here a
>> > new EOID local to that thread is allocated and that pointer adopted.
>>
>> Not sure this is good. AFAIR we have some cases where we start a
>> working thread, do something with that object in that thread, then
>> send it back to the main thread to be used. This used to be cheap, now
>> it won't.
>
> ummm we actually didn't allow objects at all to work this way WITHOUT a
> begin/end. you could send DATA and have that worker calculate data, do i/o 
> etc.
> then send result back to mainloop to implement on objects (or do it directly
> with a begin/end section). without the ability to send you can never transfer
> an object from thread to thread.

We didn't allow or block. Thus if you used an object exclusively on a
single thread, there were no issues. Or data structures, like using a
binbuf to store some data. You  could use an object as well, given it
doesn't depend on some shared resource.


> my thoughts on this were for messaging or for setting up message "pipes" with
> objects. one at each end. th1 creates 2 objects (2 ends of a comm pipe - 2 way
> like a socket), then sends one end to th2. th2 gets a "i have a new object"
> callback in its loop, discovers that it's the other end of that pipe and now
> that's an object locally in its eoid table. the objects are bound internally
> like a socketpair/pipe are in the kernel. we'd need to be able to send objects
> to do this. we'd have to have the internals work if either end was 
> deletedwhile
> the other lives etc. there would be limitations. but it'd allow you to set up
> multiple comms pipes from any thread to any other and each thread can just
> release its end when its done.

but is this for any object? Like I get an Elm_Window and make it work
like that? Or is it a specific object for communication?


> i was also thinking of object sending for inter-thread ipc. send a message 
> from
> th1 to th2 and there can be an optional object as a payload. it has to be
> simple (like no children etc.) BUT this would then be possible. yes. you have
> to release from the eoid table on one end and alloc in the eoid table on the
> other. not cheap/free but better than spinlocks. :)

again, send a message is an explicit "send this info there"? Or is it
"call a method that results in a message being sent"?



>> can't we just flag it somehow and for those we spinlock? Objects are
>> thread-private unless they are efl_add_multithread()? then you start
>> with the spinlocks for that eoid.
>
> this gets more complex. you do not know if the obj needs locking or not and
> every entry/exit point needs to check if it needs it then do a lock/unlock.
> we'd add LOTs of code we don't even have right now to every eo base class
> method and every class you inherit too. you have an issue with locks on 
> objects
> too with cb's - you have to unlock in a cb then re-lock on return from calling
> the cb. it adds a lot of code.

I was thinking to do it less fined grained locks. If non-shared
(thread private), then execute the function "X", like it does now. If
shared, then it would essentially lock-call X-unlock. Even when
dispatching the CB, since you're calling in the same thread, there is
no need to release the lock, if it's marked as recursive, you can take
it as many times as needed from that thread.  Problems would result if
a secondary thread is triggered and would result in a deadlock. But
since this would be repeatable, users would have a consistent behavior
and it would always break, so they would have to do something else,
like defer the action.


> if we allocate a bit in eoid to know if its a sharable object (thus needs
> locks), we still have an issue that if we have different local eoid tables
> there will be an id CLASH where the id can exist in both tables (ignore the
> "shareable and needs locking" bit). you would have to shynchronise all eo
> tables and their content (but leave foreign content as NULL in the leaf nodes
> to avoid being able to access it). this will mean still need locks on the EOID
> table ANd making the tables sync now will raise costs as you have to "stop all
> threads and sync" on every alloc/release of a table id.

not sure i get you or you get me.

What I mean is:

 - if shared-bit is set: use ANOTHER EOID table, one that is global
and protected by locks;

 - if shared-bit is not set: use a TLS EOID table that is exclusive to
a given thread.

IOW you do what we're doing right now IF ONLY IF bit is set. Otherwise
we use the TLS version that needs no lock.


> a local EOID space that just ignores all others until some is mapped in with a
> "stop that thread" assumption or the explicit "adopt a new obj ptr into your
> eoid table" would be far cheaper as it puts these costs only in those places
> for those objects/cases and not the other 99% :)
>
> object sending is far less code, and i think it'll be rare to send objects
> compared to all other eo transactions.

It think it will be super-rare, to the point we shouldn't even bother
with the resulting complexities (children, non-EFL resources like
CURL... etc). :-)


>> In that sense we could even use that same bit to the eo operations in
>> the object itself, the obj would have a mutex on its own and all
>> calls/events would be guarded by that... kinda of a "@synchronized" in
>> other languages.
>
> you would need synced EOID tables. you COULD send an object WITHOUT releasing
> at the other end. it now has 2 OEID's that refer to it. this is something i
> think we could do later and THEN when a SHARED object is deleted you need to
> know all tables its shared between, know all the EOID's that map to it, then
> message those other threads to release their eoid refs and when the last one 
> is
> released the obj is actually deleted. likely we would need to still have a
> master owning thread with the others having a share eoid "view". we can have a
> bit in the eoid table (no need to use an EOID bit in the ID) to know that your
> entry is a shared one and someone else has the master copy (in the master
> table another bit to know that this obj is now shared out and other threads
> have a ref and you need to wait for them all to message you, release their
> thread refs, then when those are at 0, you can call some callback to tell the
> master thread that owned/created it that everyone is done and it can then
> release it's ref). this ALSO means all eo methods have to use the mutex above
> you describe. this is a LOT MORE work as i said to have a totally shared 
> object
> and i think we can do this later without breaking api/abi and just internally,
> but it's too much work for the moment.

maybe I'm being too naïve, but in my understanding it's about getting
all Eo.h and making it check the bit, if set, lock using object's
mutex, call the actual function, then unlock. That's what
@synchronized do in other languages :-)


>> > 7. It's an EO (and EFL)-wide rule that you should not make threadsafe
>> > objects because EO just won't support it - you have to explicitly send
>> > objects around or do a begin/end of another thread to look at it's objects
>> > (and then that is limited to a thread of a different domain - we have 4 so
>> > not bad).
>>
>> it's a reasonable rule, we can remove the "@synchronized" thing from
>> above if it's not easy to implement. But if we could easily add that,
>> then we can drop the rule and extend. Initially I'd go with your
>> proposed rule.
>
> the main reason it's not easy is the need to lock at every eo func/method
> invoke, unlock on every exit point in the method, and to unlock at every
> callback call within this locked state and lock again on return from the cb.
> that has to go everywhere. :(

make the function an "_internal" or "_locked". Then you do not need to
chase all returns... just call it guarded by locks, very simple :-)

I'm not sold that we'd need to unlock before calling back the user,
just use recursive mutex that allows the same thread to lock multiple
times, so when the user calls a method from the callback, it wouldn't
deadlock.


> at the moment #7 is actually our rule for ALL "objects" in efl except for a
> few, and the only owning thread is mainloop. so its less limited. we have the
> issue of main loop begin/end where i have a proposed solution here with
> domains, and i just realised ecore_thread (and some class functions i 
> think...)
> which has functions to send feedback, check if you are cancelled etc. - we
> could use the EOID sending if we had eo equivalents, but we don't. we could
> keep ecore_thread legacy only like we do now and design something new. that'd
> be the way to go i think.

ok


>> > When you are in a begin/end section and you see 2 EOID tables, when you
>> > CREATE a new object... which one does it go into? Remember that when you
>> > CALL a method on an obj it may go create objects internally too. How can
>> > you determine which to use? You should be able to access both without
>> > creating without issue with domains as above. You could delete fine since
>> > an object knows which table it belongs to in the current thread context
>> > based on domain number. They will be different. You can't bind a foreign
>> > domain in if it matches yours - it'll fail. But creation is special.
>> >
>> > One option... if you create WITH a parent passed, the child must go into 
>> > the
>> > same domain automatically. Operations mixing domains in an object tree
>> > should fail. What about other cases? Create a bare object with no parent...
>> > you add as a child later. How to choose which domain it goes into? Local or
>> > fireign? Maybe there is a context you can switch that is in your TLS that
>> > tells you which to use (local or foreign table). If we have a push/pop
>> > setup it'd be nice, but it's easy to get wrong. An explicit call to crate
>> > with foreign and eo_add is local? So eo_foregin_add() uses the foreign
>> > domain (if adopted at the time, and if not it will either fail or just use
>> > local domain then). Worth thinking about.
>>
>> Raster, you lost me here... I guess you have too much in your mind and
>> assumed it was clear, at least it's not clear to me what you meant...
>> and I read these 2 paragraphs couple of times :-)
>
> oh... yeah. sorry. :) ummm we have the begin/end thing right?
>
> you have mainloop + thread.
>
> thread can call "ecore_main_loop_begin()" and this will sync with mainloop and
> STOP themainloop at a safe point, then the func will return and any code you 
> now
> run is "assumed to be in the mainloop context". you can mess with ui and 
> create
> timers and everything, until "ecore_main_loop_end()" which releases this lock
> and lets the main loop continue on.

ahhh.... THAT begin/end. ecore_main_loop_*...


> the idea is every efl loop will have these begin/end methods that let u sync
> and lock out the loop and PRETEND to be that loop for a hopefully small 
> section
> of code that for example updates the ui with data you have locally, then
> releases the loop again to keep running.
>
> *IF* we use TLS then during this period where you pretend to be another
> loop ... you STILL CANNOT see the other loops objects because they live in 
> that
> thread's TLS data. right? so how to solve this? we can just move over the tls
> pointer from mainloop to thread temporarily, then do your stuff, then release.
> fine. during this block you CANNOT access your "local" objects at all because
> your whole EOID namespace switched. they will be mainloop EOID's not yours. 
> can
> we solve this?
>
> yes we can! EO has 2 EOID pointers in TLS. 1 is "local". the other is
> "foreign". 995 of the time foreign is NULL. when you do a begin, foreign then
> gets the ptr for the "local" EOID table of the thread you are doing begin on..
> so stop, block other thread and continue. it is not SAFE to continue as there
> is no contention as only 1 thread is working on this data at all.
>
> but how do you know an EOID is your local one or a foreign one? solution,
> allocate 2 bits in the EOID that is like a "thread id" but i am calling it a
> domain ID. your domain for your thread MUST be different to the one you are
> doing begin on otherwise this cannot work. so i would make the mainloop ALWAYS
> have domain 0, and other threads can choose (with the default being 1, and
> expecting the threads then to only do begin/end on mainloop and no other
> threads, ut we have 2 more values (2, 3) that can be used for other threads
> then a thread in domain 1 can do begin./end on one in domain 2 or 3 etc.).
>
> this domain value (2 bits, value 0 to 3), will let us know to look in the 
> local
> table OR the foreign table for the object. we know the local domain id and if
> domain in EOID == local id, look in local table OR OTHERWISE look in the
> foreign table (we can just make it an array of 4 items - one per domain slot
> and look in that slot. when you begin() on another thread it puts that threads
> LOCAL table into your slot for that domain locally so now everything can be
> accessed).
>
> NOW we can access BOTH our local objects and the objects of the "foreign"
> thread we have done begin and end on. in fact if we use the above array any
> single thread can begin() on up to 3 other threads at any time and access
> everything. the issue is with creation of new things. where do they go?

My idea is simpler than that, see above: 1 bit: "shared"

 - shared=1. If so, use a global table (non-TLS), guarded with locks;
 - shared=0, use a TLS, no locks.

If to use a second bit, I'd do the per-object mutex for all operations
as I described above. But it's an addition, nothing to worry now.



>> To summarize my understanding with your restrictions: you create the
>> object in a thread OR send it to a thread. Then when you create, the
>> domain and all are all set to the current thread. Parent needs to be
>
> well i'm asking.. what domain should created stuff belong to when you have
> multiple domains mapped into your thread.

as above. If you create it with the bit set, then you use the global
table (non-TLS), guarded with locks. If you do not, then it's in the
TLS of the current thread.

If you want to create in another Thread/TLS, then you send a message
to that thread and let it create it there. Thus why I'm saying to
focus on communication, to make this simpler.

Doing the creation within ecore_main_loop_{begin,end} doesn't change
things at all.



>> Maybe we should focus more on easily communicate between two main
>> loops/threads? That way you do not need to pass objects and hit the
>> above complexity. All you do is to send  information, and on the
>> target thread you do the actions, like create the object.
>
> as above - messaging between loops is on the cards. being able to send objects
> that REPRESNT some complex piece of data would be nice. imagine a simple
> "database" object where you can query by key, row, column, path etc. - like an
> sql object for example (urgh ok not sql but you get the point) and thuis
> object is a database object which is really just an obj representing a big
> backend store of data you can read/write. you want another thread to access
> data? create db object, send it over. that's the kind of thing i think sending
> should be used for.

But here the underlying library may not accept being used from
multiple threads. As it's the case for CURL. Usually this is the case
:-/



>> reading this and thinking about clear multi-thread cases makes me
>> think that we need easier communication more than sharing.
>
> we HAVE to handle the begin/end case. we can't support our legacy api
> otherwise. it's a SPECIAL case in that the other thread will be paused, so its
> sharing with no locks, but we HAVE to do it to retain begin/end. i first just
> through of a single bit "mainloop vs everone else", but i realized that that 
> is
> too limiting. 2 bits would make it far better. same idea though. :)

as per above, what's the limitation? it would mean legacy API would
create with the shared bit set, would do what we're doing now.
Internal widgets and stuff like that would still benefit from
thread-private/no-locks if we wish so (unless we return the widget,
then we'd have to use the same bit as the parent).

Worst case is what we have now :-) Best case is lock-free for those
that care about performance!

However see my early comment about why people use threads and why we
should improve that before/while reworking Eo internals.

[...]

>> children, then we have non EFL resources like CURL. So at least a
>
> well children - disallowed. curl - does it matter? well ok it matters if the
> data the object is carrying like let's say curl data, is unable to work 
> outside
> a specific thread. then this is not possible. so that is a good point - this 
> is
> why it probably makes sense to have maybe a senable() method in eo base that
> returns true by default (we will make eo base sendable), but any class on top
> that cannot be send (eg internal data like curl cant move from one thread to
> another), then it overrides and returns false. design point - only ever
> override to then return false. never flip it back to true again because a
> parent class cant be sent. this would limit sending to a few specific objects.
> we could make it false by default and only enable if you know for sure you and
> all your parent classes can be sent, but how do you KNOW unless the parent
> class already returns true? :)

ok, at least something like "sendable" must be defined so we can know
if it would work or blow.


>> OTOH I do think that we could use just one bit (instead of multiple
>> domains as you said) that means "use the global EOID table guarded
>> with spinlocks".
>
> urgh. we could do that. BUT you'd need more than 1 bit. for begin/end you need
> to be able to see 2 eoid tables at once. one of them would have to be global,
> one then private. if mainloop is private then by definition EVERY thread must
> use global. this is bad.
>
> BUT with domains maybe 0 is private, 1 is global, 2 and 3 are 2 more private
> domains as i described. there still is an issue - global ids then mean peolpe
> THINK objects are threadsafe and thus they have to be made threadsafe... and
> back to the above for that. :)

I still fail to see these problems... See my comments above and if you
still think there are problems, describe some example with comments on
how/where it would be a problem.



>> Whenever it's feasible or desired to offer some "@synchronized" for
>> Eo, like objects created with that bit would have an internal mutex
>> and all access would be guarded by it.. need to be careful with the
>> deadlocks if methods are calling others, the lock would be already
>> acquired... so maybe a recursive mutex?
>
> yeah - i know. this is the pain. recursive mutex also works for not worrying
> about unlock on calling a function that exists your "frame" into a child frame
> (callback call or any other func/method). i am not sure if recursive mutexes
> are portable. it seems to work on most *nix's - not sure on openbsd, and then
> windows seems to have them. we COULD have eo actually do the lock/unlock on
> method call if the obj is a lockable obj. if we have a global eoid table then
> objects in this table will have to be lockable like this. we need to decide if
> this is a good path or not. sending is cheap if you do not go back and forth a
> LOT. shared with mutexes is better if you do.

I guess you can do recursive mutexes everywhere these days, Python and
other languages use them to do their synchronized stuff.


> i actually like the idea of a specific domain that is shareable, with others
> private. private == more performance, but we need different domains like above
> to SEE multiple private domains and those EOID tables for begin/end.
>
> but the question remains - when you then do eo_add() which does it belong to?
> private or sharable or any other domain if they exist)?

efl_add() -> private
efl_mt_add() -> shared

efl_mt_is() -> check eoid bit

efl_event_callback_call(o, ev, info) {
   if (efl_mt_is(o))
      real_ptr = efl_global_eoid_table_get(o);
   else
      real_ptr = efl_local_eoid_table_get(o);

   if (!real_ptr) {
    ERR("%p is not a valid object", o);
    return;
   }

   if (efl_synchronized_is(o)) // if we opt to use that extra bit
       mutex_lock(real_ptr->mutex);

   _efl_event_callback_call(real_ptr, ev, info);

   if (efl_synchronized_is(o)) // if we opt to use that extra bit
       mutex_unlock(real_ptr->mutex);

   // cleanup
}



-- 
Gustavo Sverzut Barbieri
--------------------------------------
Mobile: +55 (16) 99354-9890

------------------------------------------------------------------------------
_______________________________________________
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel

Re: [E-devel] RFC: EOID + Threads + TLS proposal

Reply via email to