Judging from debug output, the problem is in journal recovery, when it
tries to delete object with huge (several milion keys - it is radosgw
index* for bucket with over 50mln objects) amount of keys, using
leveldb's rmkeys_by_prefix() method.

Looking at the source code, rmkeys_by_prefix() batches all operations
into one list and then submit_transaction() executes them all atomically.

I'd love to write a patch for this issue, but it seems unfixable (or is
it?) with current API and method behaviour. Could you offer any advice
on how to proceed?

Answering myself, could anyone verify if attached patch looks ok? Should reduce memory footprint a bit.

When I first read this code, I assumed that data pointed by leveldb::Slice have to be reachable until db->Write is called.

However, looking into leveldb and into its source code, there is no such requirement - leveldb makes its own copy of key, so we're effectivly doubling memory footprint for no reason.

--
mg
--- a/src/os/LevelDBStore.cc
+++ b/src/os/LevelDBStore.cc
@@ -156,9 +156,8 @@ void LevelDBStore::LevelDBTransactionImpl::set(
   buffers.push_back(to_set_bl);
   bufferlist &bl = *(buffers.rbegin());
   string key = combine_strings(prefix, k);
-  keys.push_back(key);
-  bat.Delete(leveldb::Slice(*(keys.rbegin())));
-  bat.Put(leveldb::Slice(*(keys.rbegin())),
+  bat.Delete(leveldb::Slice(key));
+  bat.Put(leveldb::Slice(key),
          leveldb::Slice(bl.c_str(), bl.length()));
 }

@@ -166,8 +165,7 @@ void LevelDBStore::LevelDBTransactionImpl::rmkey(const 
string &prefix,
                                                 const string &k)
 {
   string key = combine_strings(prefix, k);
-  keys.push_back(key);
-  bat.Delete(leveldb::Slice(*(keys.rbegin())));
+  bat.Delete(leveldb::Slice(key));
 }

 void LevelDBStore::LevelDBTransactionImpl::rmkeys_by_prefix(const string 
&prefix)
@@ -177,8 +175,7 @@ void 
LevelDBStore::LevelDBTransactionImpl::rmkeys_by_prefix(const string &prefix
        it->valid();
        it->next()) {
     string key = combine_strings(prefix, it->key());
-    keys.push_back(key);
-    bat.Delete(*(keys.rbegin()));
+    bat.Delete(key);
   }
 }

diff --git a/src/os/LevelDBStore.h b/src/os/LevelDBStore.h
index 4617c5c..dd248dd 100644
--- a/src/os/LevelDBStore.h
+++ b/src/os/LevelDBStore.h
@@ -175,7 +175,6 @@ public:
   public:
     leveldb::WriteBatch bat;
     list<bufferlist> buffers;
-    list<string> keys;
     LevelDBStore *db;

     LevelDBTransactionImpl(LevelDBStore *db) : db(db) {}

Reply via email to