Hi, Kristian! Please, find the review below! I had only few minor comments.
On Sep 14, Kristian Nielsen wrote: > At http://bazaar.launchpad.net/~maria-captains/maria/10.0 > > ------------------------------------------------------------ > revno: 3435 > revision-id: [email protected] > parent: [email protected] > committer: [email protected] > branch nick: work-10.0-mdev225-181-232 > timestamp: Fri 2012-09-14 14:44:53 +0200 > message: > MDEV-532: Async InnoDB commit checkpoint. > > Make the commit checkpoint inside InnoDB be asynchroneous. > Implement a background thread in binlog to do the writing and flushing of > binlog checkpoint events to disk. > === modified file 'sql/log.cc' > --- a/sql/log.cc 2012-09-13 12:31:29 +0000 > +++ b/sql/log.cc 2012-09-14 12:44:53 +0000 > @@ -106,6 +107,14 @@ static SHOW_VAR binlog_status_vars_detai > {NullS, NullS, SHOW_LONG} > }; > > +/* Variables for the binlog background thread. */ Please add to the comment: ", protected by the LOCK_binlog_thread mutex" > +static bool binlog_thread_started= false; > +static bool binlog_background_thread_stop= false; I'd prefer if both variables would start from "binlog_thread" or both from "binlog_background_thread". Either way, but the same for both variables. Also, the mutex is LOCK_binlog_thread. The function is start_binlog_background_thread(). The condition is COND_binlog_thread. The queue is binlog_background_thread_queue. Could you somehow decide whether the official name for it is a "binlog background thread" or simply a "binlog thread" and use it consistently everywhere? It's also important for the configuration or status variables, if any, for the warnings, status or error messages, for the documentation pages, etc. > +static MYSQL_BIN_LOG::xid_count_per_binlog * > + binlog_background_thread_queue= NULL; > + > +static bool start_binlog_background_thread(); > + > > /** > purge logs, master and slave sides both, related error code > @@ -8116,9 +8145,128 @@ int TC_LOG_BINLOG::unlog(ulong cookie, m > void > TC_LOG_BINLOG::commit_checkpoint_notify(void *cookie) > { > - mark_xid_done(((xid_count_per_binlog *)cookie)->binlog_id, true); > + xid_count_per_binlog *entry= static_cast<xid_count_per_binlog *>(cookie); > + mysql_mutex_lock(&LOCK_binlog_thread); > + entry->next_in_queue= binlog_background_thread_queue; > + binlog_background_thread_queue= entry; > + mysql_cond_signal(&COND_binlog_thread); > + mysql_mutex_unlock(&LOCK_binlog_thread); > } > > +/* > + Binlog service thread. > + > + This thread is used to log binlog checkpoints in the background, rather > than > + in the context of random storage engine threads that happen to call > + commit_checkpoint_notify_ha() and may not like the delays while syncing > + binlog to disk or may not be setup with all my_thread_init() and other > + necessary stuff. > + > + In the future, this thread could also be used to do log rotation in the > + background, which could elimiate all stalls around binlog rotations. > +*/ > +pthread_handler_t > +binlog_background_thread(void *arg __attribute__((unused))) > +{ > + bool stop; > + MYSQL_BIN_LOG::xid_count_per_binlog *queue, *next; > + THD *thd; > + > + my_thread_init(); > + thd= new THD; > + thd->system_thread= SYSTEM_THREAD_BINLOG_BACKGROUND; > + my_pthread_setspecific_ptr(THR_THD, thd); ^^^ the normal way of setting up a THD, would be to call thd->store_globals(). Why did that not work for you and you had to resort to the manual my_pthread_setspecific_ptr() ? > + mysql_mutex_lock(&LOCK_thread_count); > + thd->thread_id= thread_id++; > + mysql_mutex_unlock(&LOCK_thread_count); > + > + for (;;) > + { > + /* > + Wait until there is something in the queue to process, or we are asked > + to shut down. > + */ > + thd_proc_info(thd, "Waiting for background binlog tasks"); In 10.0 you need to use THD_STAGE_INFO() (here and below - everywhere instead of thd_proc_info) > + mysql_mutex_lock(&mysql_bin_log.LOCK_binlog_thread); > + for (;;) > + { > + stop= binlog_background_thread_stop; > + queue= binlog_background_thread_queue; > + if (stop || queue) > + break; > + mysql_cond_wait(&mysql_bin_log.COND_binlog_thread, > + &mysql_bin_log.LOCK_binlog_thread); > + } > + /* Grab the queue, if any. */ > + binlog_background_thread_queue= NULL; > + mysql_mutex_unlock(&mysql_bin_log.LOCK_binlog_thread); ... > === modified file 'storage/innobase/handler/ha_innodb.cc' > --- a/storage/innobase/handler/ha_innodb.cc 2012-09-13 12:31:29 +0000 > +++ b/storage/innobase/handler/ha_innodb.cc 2012-09-14 12:44:53 +0000 > @@ -3008,6 +3015,16 @@ innobase_rollback_trx( > DBUG_RETURN(convert_error_code_to_mysql(error, 0, NULL)); > } > > + > +struct pending_checkpoint { > + struct pending_checkpoint *next; > + handlerton *hton; > + void *cookie; > + ib_uint64_t lsn; > +}; > +static struct pending_checkpoint *pending_checkpoint_list; > +static struct pending_checkpoint *pending_checkpoint_list_end; > + > /*****************************************************************//** > Handle a commit checkpoint request from server layer. > We simply flush the redo log immediately and do the notify call.*/ The comment seems to be wrong now, doesn't it? > @@ -3017,8 +3034,113 @@ innobase_checkpoint_request( > handlerton *hton, > void *cookie) > { > - log_buffer_flush_to_disk(); > - commit_checkpoint_notify_ha(hton, cookie); > + ib_uint64_t lsn; > + ib_uint64_t flush_lsn; > + struct pending_checkpoint * entry; > + > + /* Do the allocation outside of lock to reduce contention. The normal > + case is that not everything is flushed, so we will need to enqueue. > */ > + entry = static_cast<struct pending_checkpoint *> > + (my_malloc(sizeof(*entry), MYF(MY_WME))); > + if (!entry) { > + sql_print_error("Failed to allocate %u bytes." > + " Commit checkpoint will be skipped.", > + static_cast<unsigned>(sizeof(*entry))); > + return; > + } > + > + entry->next = NULL; > + entry->hton = hton; why do you need to remember the handlerton here? It's always innodb_hton_ptr, isn't it? Why do you even pass hton down as an argument, to be able to reuse one checkpoint request function for many engines? > + entry->cookie = cookie; > + > + mysql_mutex_lock(&pending_checkpoint_mutex); > + lsn = log_get_lsn(); > + flush_lsn = log_get_flush_lsn(); > + if (lsn > flush_lsn) { > + /* Put the request in queue. > + When the log gets flushed past the lsn, we will remove the > + entry from the queue and notify the upper layer. */ > + entry->lsn = lsn; > + if (pending_checkpoint_list_end) { > + pending_checkpoint_list_end->next = entry; please add an assert here that pending_checkpoint_list_end->lsn < entry->lsn > + } else { > + pending_checkpoint_list = entry; > + } > + pending_checkpoint_list_end = entry; > + entry = NULL; > + } > + mysql_mutex_unlock(&pending_checkpoint_mutex); > + > + if (entry) { > + /* We are already flushed. Notify the checkpoint > immediately. */ > + commit_checkpoint_notify_ha(entry->hton, entry->cookie); > + my_free(entry); I'm a bit worried, whether you're going to do a lot of small allocations. Perhaps it'd be better to use an allocator of a fixed-size objects, with a pool. To avoid going to the heap too often. The same applies to the new code log.cc. Lock-free allocator can do that. May be we have other fixed-size allocators with a pool too. > + } > +} Regards, Sergei _______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : [email protected] Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp

