On domingo, 20 de maio de 2012 02.17.53, Olivier Goffart wrote:
> On Saturday 19 May 2012 19:05:58 Thiago Macieira wrote:
> > On sábado, 19 de maio de 2012 18.34.30, Olivier Goffart wrote:
> > > Hi,
> > >
> > > Regarding valgrind:
> > >  *) On debug build, nothing is inlined.
> > >  *) If we keep it inline, then we would just need a patch like this [1]
> >
> > -fno-inline doesn't help because of -fvisibility-inlines-hidden. The call
> > cannot be rerouted to valgrind.
>
> Visibility does not really matter for valgrind. It does address redirection,
> using the debug symbols.

I see...

After playing with an application under helgrind, in the debugger, I see it
doesn't actually use ELF symbol interposition or even my second option of
inserting jumps. Since valgrind is a CPU interpreter, it simply knows when
you've reached the beginning of an intercepted function and transfers control
to the interceptor.

Provided it knows that the same function can exist in multiple libraries, it's
fine.

Anyway, we still need to approach the valgrind community and settle the
question.

> > The annotation you added might help, but as I said, adding instructions --
> > even if they produce no architectural change -- still consumes CPU
> > resources. I'd like to benchmark the annotation vs the function call.
>
> Yes, they have a cost which I am not sure we want to pay on release build.

But since we may want to helgrind release builds...

> > Indeed, but note that what it says about transactions that abort too
> > often.
> > If the transaction aborts, then the code needs to be re-run
> > non-transactionally, with the lock. That means decreased performance and
> > increased power consumption.
>
> Yes, but we are talking about the rare case in which a QMutex is shared
> between two different objects compiled with different version of Qt.
> And in that unlikely case, one can just recompile to fix the performance
> issue.

That's not what I meant. I meant that, if we were to add the XACQUIRE and
XRELEASE prefixes to all mutexes, we might end up with worse performance of 5.0
and 5.1 applications when run on Haswell because we've never tested it.

Then again, I am asking for a slightly decreased performance for all
situations.

> Indeed, QMutex can be used for all sort of cases.  There can be also way too
> much code in the critical section to fit into the transaction cache. Or
> maybe there is side effects.
>
> QMutexLocker lock(&mutex)
> qDebug() << "What now?  does it also restart the transaction?"

Yes, a SYSENTER will definitely cause a transaction abort.

Which is why it might be a good idea to use RTM instead of HLE in QMutex, so
we know which mutexes abort and we don't try again next time.

> So it is probably bad to do the lock elision within QMutex...
> We need to test it on real hardware to see if it works.
>
> But my point is that the current QMutex architecture does not keep us from
> using lock elision later.

Mixing different builds of QMutex might be even worse. If a lock is acquired
with HLE and released without, the transaction will keep running until it
aborts. And I have no clue what happens if you XRELEASE when no transaction is
running. It will definitely cause trouble if we use RTM.

Anyway, it might be something we can fix for 5.2, but are we prepared to take
the chance?

--
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center
     Intel Sweden AB - Registration Number: 556189-6027
     Knarrarnäsgatan 15, 164 40 Kista, Stockholm, Sweden

Attachment: signature.asc
Description: This is a digitally signed message part.

_______________________________________________
Development mailing list
Development@qt-project.org
http://lists.qt-project.org/mailman/listinfo/development

Reply via email to