Hi folks,

I was doing some testing earlier this week and Enis's keen eye caught something rather interesting.

When using YCSB to ingest data into a table with a secondary index using 8 threads and batch size of 1000 rows, the number of ExecService coprocessor calls actually exceeded the number of Multi calls to write the data (something like 21k ExecService calls to 18k Multi calls).

I dug into this some more and noticed that it's because each thread is creating its own ServerCache to store the serialized IndexMetadata before shipping the data table updates. So, when we have 8 threads all writing mutations for the same data and index table, we have ~8x the ServerCache entries being created than if we had just one thread.

Looking at the code, I completely understand why they're local to the thread and not shared on the Connection (very tricky), but I'm curious if anyone had noticed this before or if there are reasons to not try to share these ServerCache(s) across threads. Looking at the data being put into the ServerCache, it appears to be exactly the same for each of the threads sending mutations. I'm thinking that we could do safely by tracking when we are loading (or have loaded) the data into the ServerCache and doing some reference counting to determine when its actually safe to delete the ServerCache.

I hope to find/make some time to get a patch up, but thought I'd take a moment to write it up if anyone has opinions/feedback.

Thanks!

- Josh

Reply via email to