Re: [Drizzle-discuss] Performance Schema

Jay Pipes Fri, 07 Aug 2009 06:10:36 -0700

Paul McCullagh wrote:

On Aug 7, 2009, at 10:39 AM, Stewart Smith wrote:

On Wed, Aug 05, 2009 at 03:13:34PM +1000, Arjen Lentz wrote:

The issue in MySQL has been overhead of such instrumentation,
particularly also when not used. Some cause 5-20% perf loss which is
unacceptable.


110% agree.

If you're not doing analysis of anything, it shouldn't cost you.

You also shouldn't have to restart, rebuild or anything like that.

I think I know how to do this too.


I have this inkling that it's the "if(profiling_enabled)" inserted
everywhere that kills us.

This is pretty easy to check. Say we have some function f() that is
going to do some counting for us (e.g. number of rows fetched, number of
times mutex X was taken). If profiling is disabled, we want this to use
0 CPU.

calling an empty function int f(int) a billion times in a loop is
roughly equivilnet of just running through the loop (yes, i built with
gcc -O0 and checked the produced code). By roughly I do mean next to
impossible to measure.

If you add a simple "if(x) something;" to the function f(), it is
noticably slower! (roughly 20% in my tests).

So we really don't want to do that compare.

Now... about this time somebody is going to jump up and suggest using
DTrace to insert code at runtime. Not on Linux, so is worse than useless
here.

But we can do some cool self modifying code tricks.

The same do-nothing f() does not take any longer to run if we insert afew

no-ops. (i tried inserting 4 NOP instructions, which are single byte...
i do wonder if the multi-byte NOP instruction could help here too).

So... when a profile hook is enabled, we just modify f() to call the
real profiling function. This can either be done with an atomic
instruction writing out the appropriate CALL instruction, or we can put
in a small JMP around the NOPs as we fill it out.


and there's a number of tricks to do this pretty easily for all the
possible points to hook in profiling stuff.

Modifying code is an option, but at the same time it is quite a hack. Amajor disadvantage is that it has to be done for each type of hardwaresupported.


Exactly, and I believe it's a non-starter approach for that reason alone.

I have another suggestion, which I have found works well for PBXT(http://pbxt.blogspot.com/2008/12/xtstat-tells-you-exactly-what-pbxt-is.html).A simple increment is a very cheap operation, as long as it can be donewithout requiring a lock.

This is essentially what we already have in the way of the currentsys_var system for thread-local data that is "merged" upongSession::cleanup()

(And, if you are just doing an increment, then you don't have to botherwith a if(profiling_enabled), you just do the increment all the time.)

++

To avoid locking, each thread needs a complete set of tracking variables(counters) as part of its THD structure.


s/THD/Session

Also, you must understand that there is no one-to-one thread-to-Sessionguarantee.

Because Sessions may be executed in a thread pool, there must be a wayof either:

a) Merging Session-local stats into the global system variablesstructure upon Session destruction or rescheduling via a schedulingthread. Currently this operation does not acquire a lock around theglobal systems variables in the Session destructor:


Session::~Session()
{
...
  add_to_status(&global_status_var, &status_var);
...
}

void add_to_status(STATUS_VAR *to_var, STATUS_VAR *from_var)
{
  ulong *end= (ulong*) ((unsigned char*) to_var +
                        offsetof(STATUS_VAR, last_system_status_var) +
                        sizeof(ulong));
  ulong *to= (ulong*) to_var, *from= (ulong*) from_var;

  while (to != end)
    *(to++)+= *(from++);
}

I don't know if this critical section was deliberately left unprotectedby LOCK_status or not...still looking into this. Also, MontyT iscompletely redesigning the system variables system, so the above"bookmarking" code will not likely look the same in a few weeks.

b) Alternately, the Session's local status variables need to bepersisted to a system table in a row-level locking storage engine usingthe standard write_row() interface of the storage engine interface.Stewart currently is working on this (see his i_s storage enginebranches...)

Either way, you incur locking and instruction costs. These costs havebeen deemed too high by MySQL engineering for the hundreds (thousands?)of metrics that the MySQL performance schema monitors (or is able tomonitor). This is likely because the frequency of certain events in theperformance schema is quite high?

The profiling code pays the price for this. In order to get the currentstate of all counters it goes through the list of THDs and accumulatesthe THD related counters.
But, this is OK, because this price is only paid when you are actuallyprofiling.


Agreed in principle, yes.

This method not only works for things like "number of bytes written",but can also be used to measure time. There is a little trick involvedhere, but the result is that you can see, for example, if the server ishanging in a fsync() call in realtime.
Then we should create a kind of "drizzlestat" program which SELECTs thecurrent counter values, and displays the statistics in columns.

Before this is possible, an API into the performance data counters mustbe written. I don't want programs willy-nilly accessing internal kerneland storage engine data without going through a proper interface...we'retrying to move away from that sort of thing :)

This is much better then dumping loads of performance schema tables on auser and saying, the data is there if you need it.


Agreed.

I am also not a believer in gathering statistics on everything (forexample, every semaphore), and letting the user figure out what isimportant.

OK, sure, but what if you don't already know the cause of your slowdownis a mutex or semaphore and want to find this out?

As the developers we need to decide what are the performance criticalparameters, and just provide those statistics. Of course, statistics canbe added later if we see we have missed something. But rather that thena whole bunch of irrelevant values that make finding a problem likelooking for a needle in a haystack.


Agreed, but see point above...

Marc Alff took an approach that causes almost no overhead if theperformance schema is not *compiled in*. There is an overhead if theperformance schema is compiled in and the DBA is not careful to specifyonly those things she is interested in.

I'd love to find a perfect medium between Marc's approach (which nicelyNOOPs the performance schema code behind #define templates when it isnot compiled in) and your discussion above of non-storage of all datapieces automatically.


Cheers,

-jay

_______________________________________________
Mailing list: https://launchpad.net/~drizzle-discuss
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Drizzle-discuss] Performance Schema

Reply via email to