Hi,
during the performance weely meeting, I had mentioned
my experiences concerning the transaction structure
for write requests at the level of the FileStore.
Such a transaction not only contains the OP_WRITE
operation to the object in the file system, but also
a series of OP_OMAP_SETKEYS and OP_SETATTR operations.
Find attached a README and source code patch, which
describe a prototype for coalescing the OP_OMAP_SETKEYS
operations and the performance impact f this change.
Regards
Andreas Bluemle
--
Andreas Bluemle mailto:[email protected]
ITXperts GmbH http://www.itxperts.de
Balanstrasse 73, Geb. 08 Phone: (+49) 89 89044917
D-81541 Muenchen (Germany) Fax: (+49) 89 89044910
Company details: http://www.itxperts.de/imprint.htm
diff --git a/src/os/FileStore.cc b/src/os/FileStore.cc
index f6c3bb8..29382b2 100644
--- a/src/os/FileStore.cc
+++ b/src/os/FileStore.cc
@@ -2260,10 +2260,24 @@ int FileStore::_check_replay_guard(int fd, const SequencerPosition& spos)
}
}
+void FileStore::_coalesce(map<string, bufferlist> &target, map<string, bufferlist> &source)
+{
+ for (map<string, bufferlist>::iterator p = source.begin();
+ p != source.end();
+ p++) {
+ target[p->first] = p->second;
+ }
+ return;
+}
+
unsigned FileStore::_do_transaction(
Transaction& t, uint64_t op_seq, int trans_num,
ThreadPool::TPHandle *handle)
{
+ map<string, bufferlist> collected_aset;
+ coll_t collected_cid;
+ ghobject_t collected_oid;
+
dout(10) << "_do_transaction on " << &t << dendl;
#ifdef WITH_LTTNG
@@ -2282,6 +2296,22 @@ unsigned FileStore::_do_transaction(
_inject_failure();
+ if (op->op == Transaction::OP_OMAP_SETKEYS) {
+ collected_cid = i.get_cid(op->cid);
+ collected_oid = i.get_oid(op->oid);
+ map<string, bufferlist> aset;
+ i.decode_attrset(aset);
+ _coalesce(collected_aset, aset);
+ continue;
+ } else {
+ if (collected_aset.empty() == false) {
+ tracepoint(objectstore, omap_setkeys_enter, osr_name);
+ r = _omap_setkeys(collected_cid, collected_oid, collected_aset, spos);
+ tracepoint(objectstore, omap_setkeys_exit, r);
+ collected_aset.clear();
+ }
+ }
+
switch (op->op) {
case Transaction::OP_NOP:
break;
diff --git a/src/os/FileStore.h b/src/os/FileStore.h
index af1fb8d..a039731 100644
--- a/src/os/FileStore.h
+++ b/src/os/FileStore.h
@@ -449,6 +449,8 @@ public:
int statfs(struct statfs *buf);
+ void _coalesce( map<string, bufferlist> &target, map<string, bufferlist> &source);
+
int _do_transactions(
list<Transaction*> &tls, uint64_t op_seq,
ThreadPool::TPHandle *handle);
Coalescing OMAP_SETKEYS operations in a write transaction
---------------------------------------------------------
Description
-----------
At the level of FileStore, every write request is embedded in a transaction
which consists of
6 key-value pair settings in 3 OMAP_SETKEYS operations
the actual OP_WRITE
2 settings in the extended file system attributes.
The modification of the FileStore::_do_transaction() coalesces the
6 key-value pairs into a single operation, with the side effect of
reducing the number of key-value pairs to 5: one key appears twice
and only the last values is going to be set.
Performance improvement
-----------------------
Cluster with 3 storage nodes, 4 osd (SAS disk, SSD journal) per node,
separate client node with rbd using the kernel clients,
test load generated by fio, randon write, 4K block size, iodepth 16.
client improvement: approx. 5 % (12890 iops vs. 13369 iops)
storage node improvement: reduction in CPU consuptiom of ceph-osd daemon
by 10%; see follwoing table (derived from /proc/<pid>/schedstat:
ceph-osd process and CPU usage | CPU usage
thread classes v0.91 unmodified | v0.91 with coalescing
---------------------------------------------------+----------------------
total cpu usage: 43.17 CPU-seconds | 39.33 CPU-seconds
|
ThreadPool::WorkThread::entry(): 15.56 36.04% | 12.45 31.66%
ShardedThreadPool::workers: 8.07 18.70% | 7.94 20.18%
Pipe::Reader:: 5.81 13.45% | 5.92 15.04%
Pipe::Writer::entry(): 4.59 10.63% | 4.73 12.02%
FileJournal::Writer:: 2.41 5.57% | 2.45 6.22%
Finisher::finisher_thread: 2.86 6.63% | 1.03 2.61%
|
WBThrottle::entry: n/a n/a | 0.81 2.06%
Interesting: with coalescing active, the WBthrottle shows up in CPU usage.
In the default case, this was almost invisible.
Source/Patch
------------
https://www.github.com/andreas-bluemle/ceph
commit f33c48358f762cbeb5d30724efacf78ff5438e9e
patches:
relative to pull request at https://www.github.com/andreas-bluemle/ceph
ceph-andreas-bluemle.file-store-omap_setkeys-colaescing.patch
relative to ceph master at at https://www.github.com
(commit a7a70cabe25fdfe3322c784f6797231d14e112c2)
ceph-master.file-store-omap_setkeys-colaescing.patch