FYI, you can retrieve an IndexMaintainer from the index PTable given the data PTable and it will be lazily cached on the index PTable.
On Wed, Nov 23, 2016 at 7:34 AM Josh Elser <josh.el...@gmail.com> wrote: > Hrm, that sounds like it'd be cleaner to me. Just thinking about this > problem again made me shudder at the client-side complexity :) > > I'll have to find some time to revisit this one. > > Thanks for the suggestion, Ankit! > > Ankit Singhal wrote: > > How about not sending the IndexMaintainers from the client and prepare > them > > at the server itself and cache/refresh it per table like we do currently > > for PTable? > > > > On Mon, Oct 24, 2016 at 9:32 AM, Josh Elser<josh.el...@gmail.com> > wrote: > > > >> If anyone is interested, I did hack on this some more over the weekend. > >> > >> https://github.com/joshelser/phoenix/tree/reduced-server-cache-rpc > >> > >> Very much in a state of "well, it compiles". Will try to find some more > >> time to poke at it and measure whether or not it actually makes a > positive > >> impact (with serialized IndexMaintainers only being about 20bytes for > one > >> index table, the server-side memory impact certainly isn't that crazy, > but > >> the extra RPCs likely adds up). > >> > >> Feedback welcome from the brave :) > >> > >> > >> Josh Elser wrote: > >> > >>> Hi folks, > >>> > >>> I was doing some testing earlier this week and Enis's keen eye caught > >>> something rather interesting. > >>> > >>> When using YCSB to ingest data into a table with a secondary index > using > >>> 8 threads and batch size of 1000 rows, the number of ExecService > >>> coprocessor calls actually exceeded the number of Multi calls to write > >>> the data (something like 21k ExecService calls to 18k Multi calls). > >>> > >>> I dug into this some more and noticed that it's because each thread is > >>> creating its own ServerCache to store the serialized IndexMetadata > >>> before shipping the data table updates. So, when we have 8 threads all > >>> writing mutations for the same data and index table, we have ~8x the > >>> ServerCache entries being created than if we had just one thread. > >>> > >>> Looking at the code, I completely understand why they're local to the > >>> thread and not shared on the Connection (very tricky), but I'm curious > >>> if anyone had noticed this before or if there are reasons to not try to > >>> share these ServerCache(s) across threads. Looking at the data being > put > >>> into the ServerCache, it appears to be exactly the same for each of the > >>> threads sending mutations. I'm thinking that we could do safely by > >>> tracking when we are loading (or have loaded) the data into the > >>> ServerCache and doing some reference counting to determine when its > >>> actually safe to delete the ServerCache. > >>> > >>> I hope to find/make some time to get a patch up, but thought I'd take a > >>> moment to write it up if anyone has opinions/feedback. > >>> > >>> Thanks! > >>> > >>> - Josh > >>> > > >