I have not implemented it for recommendations but a layered cache/sieve structure could be useful.
That is, between batch refreshes you can keep tacking on new updates in a cascading order so values that are updated exist in the newest layer but otherwise the lookup goes for the latest updated layer. You can put a fractional multiplier on older layers for aging but again I've not implemented it. On Friday, April 17, 2015, Ted Dunning <ted.dunn...@gmail.com> wrote: > > Yes. Also add the fact that the nano batches are bounded tightly in size > both max and mean. And mostly filtered away anyway. > > Aging is an open question. I have never seen any effect of alternative > sampling so I would just assume "keep oldest" which just tosses more > samples. Then occasionally rebuild from batch if you really want aging to > go right. > > Search updates any more are true realtime also so that works very well. > > Sent from my iPhone > > > On Apr 17, 2015, at 17:20, Pat Ferrel <p...@occamsmachete.com > <javascript:;>> wrote: > > > > Thanks. > > > > This idea is based on a micro-batch of interactions per update, not > individual ones unless I missed something. That matches the typical input > flow. Most interactions are filtered away by frequency and number of > interaction cuts. > > > > A couple practical issues > > > > In practice won’t this require aging of interactions too? So wouldn’t > the update require some old interaction removal? I suppose this might just > take the form of added null interactions representing the geriatric ones? > Haven’t gone through the math with enough detail to see if you’ve already > accounted for this. > > > > To use actual math (self-join, etc.) we still need to alter the geometry > of the interactions to have the same row rank as the adjusted total. In > other words the number of rows in all resulting interactions must be the > same. Over time this means completely removing rows and columns or allowing > empty rows in potentially all input matrices. > > > > Might not be too bad to accumulate gaps in rows and columns. Not sure if > it would have a practical impact (to some large limit) as long as it was > done, to keep the real size more or less fixed. > > > > As to realtime, that would be under search engine control through > incremental indexing and there are a couple ways to do that, not a problem > afaik. As you point out the query always works and is real time. The index > update must be frequent and not impact the engine's availability for > queries. > > > > On Apr 17, 2015, at 2:46 PM, Ted Dunning <ted.dunn...@gmail.com > <javascript:;>> wrote: > > > > > > When I think of real-time adaptation of indicators, I think of this: > > > > > http://www.slideshare.net/tdunning/realtime-puppies-and-ponies-evolving-indicator-recommendations-in-realtime > > > > > >> On Fri, Apr 17, 2015 at 6:51 PM, Pat Ferrel <p...@occamsmachete.com > <javascript:;>> wrote: > >> I’ve been thinking about Streaming (continuous input) and incremental > coccurrence. > >> > >> As interactions stream in from the user it it fairly simple to use > something like Spark streaming to maintain a moving time window for all > input, and an update frequency that recalcs all input currently in the time > window. I’ve done this with the current cooccurrence code but though > streaming, this is not incremental. > >> > >> The current data flow goes from interaction input to geometry and user > dictionary reconciliation to A’A, A’B etc. After the multiply the resulting > cooccurrence matrices are LLR weighted/filtered/down-sampled. > >> > >> Incremental can mean all sorts of things and may imply different > trade-offs. Did you have anything specific in mind? > > > > >