FYI, you can retrieve an IndexMaintainer from the index PTable given the
data PTable and it will be lazily cached on the index PTable.

On Wed, Nov 23, 2016 at 7:34 AM Josh Elser <josh.el...@gmail.com> wrote:

> Hrm, that sounds like it'd be cleaner to me. Just thinking about this
> problem again made me shudder at the client-side complexity :)
>
> I'll have to find some time to revisit this one.
>
> Thanks for the suggestion, Ankit!
>
> Ankit Singhal wrote:
> > How about not sending the IndexMaintainers from the client and prepare
> them
> > at the server itself and cache/refresh it per table like we do currently
> > for PTable?
> >
> > On Mon, Oct 24, 2016 at 9:32 AM, Josh Elser<josh.el...@gmail.com>
> wrote:
> >
> >> If anyone is interested, I did hack on this some more over the weekend.
> >>
> >> https://github.com/joshelser/phoenix/tree/reduced-server-cache-rpc
> >>
> >> Very much in a state of "well, it compiles". Will try to find some more
> >> time to poke at it and measure whether or not it actually makes a
> positive
> >> impact (with serialized IndexMaintainers only being about 20bytes for
> one
> >> index table, the server-side memory impact certainly isn't that crazy,
> but
> >> the extra RPCs likely adds up).
> >>
> >> Feedback welcome from the brave :)
> >>
> >>
> >> Josh Elser wrote:
> >>
> >>> Hi folks,
> >>>
> >>> I was doing some testing earlier this week and Enis's keen eye caught
> >>> something rather interesting.
> >>>
> >>> When using YCSB to ingest data into a table with a secondary index
> using
> >>> 8 threads and batch size of 1000 rows, the number of ExecService
> >>> coprocessor calls actually exceeded the number of Multi calls to write
> >>> the data (something like 21k ExecService calls to 18k Multi calls).
> >>>
> >>> I dug into this some more and noticed that it's because each thread is
> >>> creating its own ServerCache to store the serialized IndexMetadata
> >>> before shipping the data table updates. So, when we have 8 threads all
> >>> writing mutations for the same data and index table, we have ~8x the
> >>> ServerCache entries being created than if we had just one thread.
> >>>
> >>> Looking at the code, I completely understand why they're local to the
> >>> thread and not shared on the Connection (very tricky), but I'm curious
> >>> if anyone had noticed this before or if there are reasons to not try to
> >>> share these ServerCache(s) across threads. Looking at the data being
> put
> >>> into the ServerCache, it appears to be exactly the same for each of the
> >>> threads sending mutations. I'm thinking that we could do safely by
> >>> tracking when we are loading (or have loaded) the data into the
> >>> ServerCache and doing some reference counting to determine when its
> >>> actually safe to delete the ServerCache.
> >>>
> >>> I hope to find/make some time to get a patch up, but thought I'd take a
> >>> moment to write it up if anyone has opinions/feedback.
> >>>
> >>> Thanks!
> >>>
> >>> - Josh
> >>>
> >
>

Reply via email to