Re: [E-devel] RFC: EOID + Threads + TLS proposal

The Rasterman Sat, 03 Sep 2016 20:01:01 -0700

On Sat, 3 Sep 2016 19:38:42 -0300 Gustavo Sverzut Barbieri <[email protected]>
said:


> > So here is an idea. I checked. A TLS lookup for any var is 1/5th the cost
> > or so of a lock+unlock. We could use __thread and this is FREE (no cost -
> > well it's a mov only), but this is only free in binaries not shared
> > libraries, so let's talk TLS which is much cheaper than a lock anyway. Why
> > TLS?
> >
> > 1. It will drop out cost above a lot.
> > 2. We can remove several other locks in EO too cutting some more costs.
> > 3. We can REALLY CHEAPLY enforce the rule of "you may not access an object
> > outside its owning thread". Because every thread has it's own EOID table,
> > the EOID will be local to that thread only. Looking it up in another thread
> > is looking up an "invalid" EOID. In fact we can make this pretty much
> > always fail by using some domain bits in EOID like i mentioned above (steal
> > some from table entries and/or generation count). Now literally stuff will
> > FAIL and not magically work 99% of the time if you disobey these rules and
> > access a rectangle object or a timer from another thread that doesn't own
> > them as the objects literally are not in the local thread EOID table. The
> > tread cannot see them (well is very unlikely to see them).
> 
> this is a very good approach and the side-effect is a nice one :-)
> 
> given that we do not propose or use lots of multi-thread, particularly
> multiple threads concurring for the same object, this looks the best
> path forward.

yeah.. BUT we get lots of noise from people wanting to use multiple threads
with efl. we have ways (eocre_main_loop_async_call/sync_call,
ecore_main_loop_begin/end, ecore_thread with "result" cb's in mainloop etc.).

the way their mindset works, that i have gathered, is all efl funcs be it
widgets, timers, whatever can be used anywhere in any thread and all will be
magically fine. that's night impossible especially with a mainloop and ui.
especially ui. ui would end up half-rendering a ui with the widgets half
configured if a thread was still setting up something when a render pass came
in. thus the "render when idle" method. the begin/end method above allows you
to basically sync with mainloop from thread, take a lock, do your stuff and get
out again. to avoid glitches like above we'd need this anyway, so we have it.

we should be using more threads - though mostly inside efl to make an op async
or faster so blocking time is shorter, but we still need to support people. we
cannot support the way they think threads should be done, but with the
begin/end we get as close as it actually sanely possible. so while we don't use
a lot of threads, others do. that is one of the design aspecs for efl
interfaces making the mainloop and object. the idea is you could have different
threads with different loops. there will always be a MAIN loop (in the thread
with main() called), BUT why not have other loops in threads that can
communicate to/from the main loop? this would make these people above much
happier. also this "split eo objects" means they could use non-ui objects
pretty easily in these threads - eg efl.net and thus a local EOID table would
make a lot of sense.

> > 2. This thread init sets up an initial generation count at a random value so
> > generation counts can't easily be in sync and maybe swizzles the order
> > EOID's are found/allocated and so on to minimize ID "sameness".
> 
> random is bad, maybe we can come with some way to guarantee it will
> not collide with the threads in use?

only the domain bits would guarantee that. generation count start value is
irrelevant really, i was just thinking to make it random to avoid 2 threads
having the same creation patterns thus likely to have same gen count AND the
table entry allocation.

> > 4. We make it a clear rule that threads cannot access objects outside of
> > those that they created UNLESS:
> > 4.1 An object is explicitly SENT from one thread to another (we can do this
> > later but if this is done, the object must have a refcount of 1 only, no
> > parent, no children, no objects referenced in keys, weak refts to/from this
> > object etc.). We can release the EOID entry in thread 1, but not call
> > destructor and free object memory, send the POINTER to thread 2, and here a
> > new EOID local to that thread is allocated and that pointer adopted.
> 
> Not sure this is good. AFAIR we have some cases where we start a
> working thread, do something with that object in that thread, then
> send it back to the main thread to be used. This used to be cheap, now
> it won't.

ummm we actually didn't allow objects at all to work this way WITHOUT a
begin/end. you could send DATA and have that worker calculate data, do i/o etc.
then send result back to mainloop to implement on objects (or do it directly
with a begin/end section). without the ability to send you can never transfer
an object from thread to thread.

my thoughts on this were for messaging or for setting up message "pipes" with
objects. one at each end. th1 creates 2 objects (2 ends of a comm pipe - 2 way
like a socket), then sends one end to th2. th2 gets a "i have a new object"
callback in its loop, discovers that it's the other end of that pipe and now
that's an object locally in its eoid table. the objects are bound internally
like a socketpair/pipe are in the kernel. we'd need to be able to send objects
to do this. we'd have to have the internals work if either end was deletedwhile
the other lives etc. there would be limitations. but it'd allow you to set up
multiple comms pipes from any thread to any other and each thread can just
release its end when its done.

i was also thinking of object sending for inter-thread ipc. send a message from
th1 to th2 and there can be an optional object as a payload. it has to be
simple (like no children etc.) BUT this would then be possible. yes. you have
to release from the eoid table on one end and alloc in the eoid table on the
other. not cheap/free but better than spinlocks. :)

> can't we just flag it somehow and for those we spinlock? Objects are
> thread-private unless they are efl_add_multithread()? then you start
> with the spinlocks for that eoid.

this gets more complex. you do not know if the obj needs locking or not and
every entry/exit point needs to check if it needs it then do a lock/unlock.
we'd add LOTs of code we don't even have right now to every eo base class
method and every class you inherit too. you have an issue with locks on objects
too with cb's - you have to unlock in a cb then re-lock on return from calling
the cb. it adds a lot of code.

if we allocate a bit in eoid to know if its a sharable object (thus needs
locks), we still have an issue that if we have different local eoid tables
there will be an id CLASH where the id can exist in both tables (ignore the
"shareable and needs locking" bit). you would have to shynchronise all eo
tables and their content (but leave foreign content as NULL in the leaf nodes
to avoid being able to access it). this will mean still need locks on the EOID
table ANd making the tables sync now will raise costs as you have to "stop all
threads and sync" on every alloc/release of a table id.

a local EOID space that just ignores all others until some is mapped in with a
"stop that thread" assumption or the explicit "adopt a new obj ptr into your
eoid table" would be far cheaper as it puts these costs only in those places
for those objects/cases and not the other 99% :)

object sending is far less code, and i think it'll be rare to send objects
compared to all other eo transactions.

> In that sense we could even use that same bit to the eo operations in
> the object itself, the obj would have a mutex on its own and all
> calls/events would be guarded by that... kinda of a "@synchronized" in
> other languages.

you would need synced EOID tables. you COULD send an object WITHOUT releasing
at the other end. it now has 2 OEID's that refer to it. this is something i
think we could do later and THEN when a SHARED object is deleted you need to
know all tables its shared between, know all the EOID's that map to it, then
message those other threads to release their eoid refs and when the last one is
released the obj is actually deleted. likely we would need to still have a
master owning thread with the others having a share eoid "view". we can have a
bit in the eoid table (no need to use an EOID bit in the ID) to know that your
entry is a shared one and someone else has the master copy (in the master
table another bit to know that this obj is now shared out and other threads
have a ref and you need to wait for them all to message you, release their
thread refs, then when those are at 0, you can call some callback to tell the
master thread that owned/created it that everyone is done and it can then
release it's ref). this ALSO means all eo methods have to use the mutex above
you describe. this is a LOT MORE work as i said to have a totally shared object
and i think we can do this later without breaking api/abi and just internally,
but it's too much work for the moment.

> > 7. It's an EO (and EFL)-wide rule that you should not make threadsafe
> > objects because EO just won't support it - you have to explicitly send
> > objects around or do a begin/end of another thread to look at it's objects
> > (and then that is limited to a thread of a different domain - we have 4 so
> > not bad).
> 
> it's a reasonable rule, we can remove the "@synchronized" thing from
> above if it's not easy to implement. But if we could easily add that,
> then we can drop the rule and extend. Initially I'd go with your
> proposed rule.

the main reason it's not easy is the need to lock at every eo func/method
invoke, unlock on every exit point in the method, and to unlock at every
callback call within this locked state and lock again on return from the cb.
that has to go everywhere. :(

at the moment #7 is actually our rule for ALL "objects" in efl except for a
few, and the only owning thread is mainloop. so its less limited. we have the
issue of main loop begin/end where i have a proposed solution here with
domains, and i just realised ecore_thread (and some class functions i think...)
which has functions to send feedback, check if you are cancelled etc. - we
could use the EOID sending if we had eo equivalents, but we don't. we could
keep ecore_thread legacy only like we do now and design something new. that'd
be the way to go i think.

> > When you are in a begin/end section and you see 2 EOID tables, when you
> > CREATE a new object... which one does it go into? Remember that when you
> > CALL a method on an obj it may go create objects internally too. How can
> > you determine which to use? You should be able to access both without
> > creating without issue with domains as above. You could delete fine since
> > an object knows which table it belongs to in the current thread context
> > based on domain number. They will be different. You can't bind a foreign
> > domain in if it matches yours - it'll fail. But creation is special.
> >
> > One option... if you create WITH a parent passed, the child must go into the
> > same domain automatically. Operations mixing domains in an object tree
> > should fail. What about other cases? Create a bare object with no parent...
> > you add as a child later. How to choose which domain it goes into? Local or
> > fireign? Maybe there is a context you can switch that is in your TLS that
> > tells you which to use (local or foreign table). If we have a push/pop
> > setup it'd be nice, but it's easy to get wrong. An explicit call to crate
> > with foreign and eo_add is local? So eo_foregin_add() uses the foreign
> > domain (if adopted at the time, and if not it will either fail or just use
> > local domain then). Worth thinking about.
> 
> Raster, you lost me here... I guess you have too much in your mind and
> assumed it was clear, at least it's not clear to me what you meant...
> and I read these 2 paragraphs couple of times :-)

oh... yeah. sorry. :) ummm we have the begin/end thing right?

you have mainloop + thread.

thread can call "ecore_main_loop_begin()" and this will sync with mainloop and
STOP themainloop at a safe point, then the func will return and any code you now
run is "assumed to be in the mainloop context". you can mess with ui and create
timers and everything, until "ecore_main_loop_end()" which releases this lock
and lets the main loop continue on.

the idea is every efl loop will have these begin/end methods that let u sync
and lock out the loop and PRETEND to be that loop for a hopefully small section
of code that for example updates the ui with data you have locally, then
releases the loop again to keep running.

*IF* we use TLS then during this period where you pretend to be another
loop ... you STILL CANNOT see the other loops objects because they live in that
thread's TLS data. right? so how to solve this? we can just move over the tls
pointer from mainloop to thread temporarily, then do your stuff, then release.
fine. during this block you CANNOT access your "local" objects at all because
your whole EOID namespace switched. they will be mainloop EOID's not yours. can
we solve this?

yes we can! EO has 2 EOID pointers in TLS. 1 is "local". the other is
"foreign". 995 of the time foreign is NULL. when you do a begin, foreign then
gets the ptr for the "local" EOID table of the thread you are doing begin on..
so stop, block other thread and continue. it is not SAFE to continue as there
is no contention as only 1 thread is working on this data at all.

but how do you know an EOID is your local one or a foreign one? solution,
allocate 2 bits in the EOID that is like a "thread id" but i am calling it a
domain ID. your domain for your thread MUST be different to the one you are
doing begin on otherwise this cannot work. so i would make the mainloop ALWAYS
have domain 0, and other threads can choose (with the default being 1, and
expecting the threads then to only do begin/end on mainloop and no other
threads, ut we have 2 more values (2, 3) that can be used for other threads
then a thread in domain 1 can do begin./end on one in domain 2 or 3 etc.).

this domain value (2 bits, value 0 to 3), will let us know to look in the local
table OR the foreign table for the object. we know the local domain id and if
domain in EOID == local id, look in local table OR OTHERWISE look in the
foreign table (we can just make it an array of 4 items - one per domain slot
and look in that slot. when you begin() on another thread it puts that threads
LOCAL table into your slot for that domain locally so now everything can be
accessed).

NOW we can access BOTH our local objects and the objects of the "foreign"
thread we have done begin and end on. in fact if we use the above array any
single thread can begin() on up to 3 other threads at any time and access
everything. the issue is with creation of new things. where do they go?

> To summarize my understanding with your restrictions: you create the
> object in a thread OR send it to a thread. Then when you create, the
> domain and all are all set to the current thread. Parent needs to be

well i'm asking.. what domain should created stuff belong to when you have
multiple domains mapped into your thread.

> used, thus of course it's only valid in that thread. If you send the
> object, some special machinery would remove its availability in the
> current thread and create a new one in the secondary thread... (which
> is more complex when we think about children, what to do... and even
> more complex if we think about other, non Eo resources that may happen
> to cause problems, like imagine you send a Efl.Net.Dialer.Http to a
> secondary thread, CURL will barf).

well i wasn't thinking of sending at all. just needing to decide WHICH domain a
created object belongs to. once in a domain it has to stay there. same with
children, parents etc.

> Maybe we should focus more on easily communicate between two main
> loops/threads? That way you do not need to pass objects and hit the
> above complexity. All you do is to send  information, and on the
> target thread you do the actions, like create the object.

as above - messaging between loops is on the cards. being able to send objects
that REPRESNT some complex piece of data would be nice. imagine a simple
"database" object where you can query by key, row, column, path etc. - like an
sql object for example (urgh ok not sql but you get the point) and thuis
object is a database object which is really just an obj representing a big
backend store of data you can read/write. you want another thread to access
data? create db object, send it over. that's the kind of thing i think sending
should be used for.

> > This buys us REALLY NICE "thread safety" in the way that objects are just
> > not allowed to span threads. They must be explicitly sent over and thus
> > ownership (and EOID value) changes, or you must explicitly do a begin/end
> > on another thread and adopt it's ID table into your local space as a
> > "foregin" table that allows you definitely "read only" access easily, even
> > the ability to modify and delete, but just creation is tricky. This really
> > will clear up lots of mistakes we have been seeing from code that uses
> > threads and does "bad things" that happen to work 99% of the time then fail
> > oddly 1% of the time. We don't have to write "is this my thread" checking
> > code in every method because the design will do that mostly for us as a
> > side-effect of TLS and normal EOID checking. This also buys us simplicity
> > when dealing with objects as we can assume the nice old fashioned way of
> > "no need to lock or consider threads - it will not be an issue", and it
> > buys us a good speedup vs what we have now. It does mean another bit of a
> > re-jigging of eo internals and we need to add some API's to be able to do
> > begin/end etc. and we can later add object sending.
> 
> reading this and thinking about clear multi-thread cases makes me
> think that we need easier communication more than sharing.

we HAVE to handle the begin/end case. we can't support our legacy api
otherwise. it's a SPECIAL case in that the other thread will be paused, so its
sharing with no locks, but we HAVE to do it to retain begin/end. i first just
through of a single bit "mainloop vs everone else", but i realized that that is
too limiting. 2 bits would make it far better. same idea though. :)

> ecore-con: we need thread to asynchronously resolve names. You do not
> need the actual object to do that, send a string, return a struct with
> the return of getaddrinfo().

oh this is internal threads. not even mentioning that here. :) ecore_con deals
with that internally. it could be a thread or an async resolve etc. - the point
being that its not exposed. :)

> evas: we need thread to compute what's to be rendered and paint the
> pixels. information sent is what changed and if it was done.
> 
> image loading... video decoding...

same for all the above. internals. i'm talking exposed use of threads "in an
app" :)

> I guess the ecore_thread does most of that, if not we can extend a
> little bit. But none of them would need to send objects.

you do need to EXPOSE objects though with begin/end. for sending - see above db
object example, or the set up endpoints for communications within a process
etc.

> > I'm rather happy with this kind of direction. We never have to make eo
> > objects thread safe or create a special eo thread safe base class. Ever.
> > You want to talk from thread to thread, then we can have endpoint objects
> > that get created with one object on one end of the msg pipe (like a
> > socketpair() or pipe()), and a different object on the other end (create in
> > one place then send one end to another thread? The object internals hook
> > them up via pipes or threadqueues?). This makes currently "incorrect and
> > dangerous code" fail early instead of 1% of the time. It catches issues
> > fast. It gets us speedups. This potentially impacts promises too, but
> > probably in a good way. This also affects bindings - looking at JS/Lua
> > specificially. If Lua had a threading model right now it'd be to have 1
> > luastate per thread, but this means we can't share objects... this means
> > eo's model is the same and you would have to detach an object from one
> > thread (luastate) and make one appear on the other end. I'm also happy that
> > I think this solves a disagreement on how to do threading. I know we have
> > to somehow and not just stick our heads in the sand. This solves it. It
> > gives a clear optimal model that is relatively robust and efficient.
> 
> agreed, but as I said I don't think passing objects will be used, we
> can skip that complexity.

sending is rather simple actually, thus why i mention it. and it can be used
for comms setup. like one end does a bind+listen, the other a connect. one
thread sets up an "incoming comms endpoing add" callback on lets say their loop
object and when another thread "connects" they get a cb with a new obj
representing their end of that comms pipe. :) that is one way to do arbitrary
async point to point comms setup. and sending db objects around where you use
this as "large datastores" and share 1 place at a time, but zero copy is kind
of nice and works nicely with bindings. think of the db object as a higher level
version of sending a void * around. :)

> > The downsides are the extra work, the need to add API's to init eo per
> > thread, need to check for objects on thread shutdown (need the TLS free
> > func to do sanity checking for still-alive objects), the need to add all
> > this foreign vs. local table stuff for begin+end and then to hook that in,
> > and the likely need to add at least object sending, and the messaging
> > infra. Also it will use a bit more memory if you use eo across threads and
> > actually create objects in other threads. As well any mempools and other
> > things that we currently share across threads may need to become per-thread.
> >
> > So ... with that. Questions, comments, queries, devils-advocate. Find issues
> > with this. Wrap your head around it. This is very important as this impacts
> > many things subtly and some directly and clearly.
> 
> I'm very happy with the TLS side effect of blocking calls from
> multiple threads AND speeding up due lack of locking.
> 
> But I don't think sending object to other threads will be that useful,
> if we make it easier to communicate threads, just communicate and ask
> that thread to create it for you.

the sending is simple. far far far far more complex is the begin/end stuff.
shared objects are insanely hard to do right and even more work. we HAVE to do
begin/end to maintain api compatibility anyway. so doing a send is a walk in
the park in comparison and has uses FOR the inter-thread communication - like
payload of messages can be a db object for example.

> We could do a special thread communication primitive that sends the
> object internal data, creating a new EOID there... but then we have

thats EXACTLY what i was thinking of for object sending.

it keep the object ptr alive, not calling destructor. it releases the EOID in
thread 1, sends ptr to thread 2, it "adopts" the ptr and allocates a new EOID
for it. presto. objetct moved threads. the cost is an EOID release and an EOID
alloc and otherwise an ipc "pointer" from thread to thread. if you are messaging
already this last part is free.

the problem is parent, child etc. objects of the one you sent, thus the
restrictions on reference count, parent/children etc.

> children, then we have non EFL resources like CURL. So at least a

well children - disallowed. curl - does it matter? well ok it matters if the
data the object is carrying like let's say curl data, is unable to work outside
a specific thread. then this is not possible. so that is a good point - this is
why it probably makes sense to have maybe a senable() method in eo base that
returns true by default (we will make eo base sendable), but any class on top
that cannot be send (eg internal data like curl cant move from one thread to
another), then it overrides and returns false. design point - only ever
override to then return false. never flip it back to true again because a
parent class cant be sent. this would limit sending to a few specific objects.
we could make it false by default and only enable if you know for sure you and
all your parent classes can be sent, but how do you KNOW unless the parent
class already returns true? :)

> class information saying "never send me to another thread". If we
> block sending objects, the constructor could just check if another
> thread was used and refuse to create... but if it's created, EO needs
> to help by checking that.

yup. i'm thinking sendable() method returning true/false.

> OTOH I do think that we could use just one bit (instead of multiple
> domains as you said) that means "use the global EOID table guarded
> with spinlocks".

urgh. we could do that. BUT you'd need more than 1 bit. for begin/end you need
to be able to see 2 eoid tables at once. one of them would have to be global,
one then private. if mainloop is private then by definition EVERY thread must
use global. this is bad.

BUT with domains maybe 0 is private, 1 is global, 2 and 3 are 2 more private
domains as i described. there still is an issue - global ids then mean peolpe
THINK objects are threadsafe and thus they have to be made threadsafe... and
back to the above for that. :)

> Whenever it's feasible or desired to offer some "@synchronized" for
> Eo, like objects created with that bit would have an internal mutex
> and all access would be guarded by it.. need to be careful with the
> deadlocks if methods are calling others, the lock would be already
> acquired... so maybe a recursive mutex?

yeah - i know. this is the pain. recursive mutex also works for not worrying
about unlock on calling a function that exists your "frame" into a child frame
(callback call or any other func/method). i am not sure if recursive mutexes
are portable. it seems to work on most *nix's - not sure on openbsd, and then
windows seems to have them. we COULD have eo actually do the lock/unlock on
method call if the obj is a lockable obj. if we have a global eoid table then
objects in this table will have to be lockable like this. we need to decide if
this is a good path or not. sending is cheap if you do not go back and forth a
LOT. shared with mutexes is better if you do.

i actually like the idea of a specific domain that is shareable, with others
private. private == more performance, but we need different domains like above
to SEE multiple private domains and those EOID tables for begin/end.

but the question remains - when you then do eo_add() which does it belong to?
private or sharable or any other domain if they exist)?

> -- 
> Gustavo Sverzut Barbieri
> --------------------------------------
> Mobile: +55 (16) 99354-9890
> 
> ------------------------------------------------------------------------------
> _______________________________________________
> enlightenment-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/enlightenment-devel
> 


-- 
------------- Codito, ergo sum - "I code, therefore I am" --------------
The Rasterman (Carsten Haitzler)    [email protected]


------------------------------------------------------------------------------
_______________________________________________
enlightenment-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel

Re: [E-devel] RFC: EOID + Threads + TLS proposal

Reply via email to