Cool On Saturday, April 18, 2015, Ted Dunning <ted.dunn...@gmail.com> wrote:
> > Andrew > > Take a look at the slides I posted. In them I showed that the update does > not grow beyond a very reasonable bound. > > Sent from my iPhone > > > On Apr 18, 2015, at 9:15, Andrew Musselman <andrew.mussel...@gmail.com > <javascript:;>> wrote: > > > > Yes that's what I mean; if the number of updates gets too big it probably > > would be unmanageable though. This approach worked well with daily > > updates, but never tried it with anything "real time." > > > >> On Saturday, April 18, 2015, Pat Ferrel <p...@occamsmachete.com > <javascript:;>> wrote: > >> > >> I think you are saying that instead of val newHashMap = lastHashMap ++ > >> updateHashMap, layered updates might be useful since new and last are > >> potentially large. Some limit of updates might trigger a refresh. This > >> might work if the update works with incremental index updates in the > search > >> engine. Given practical considerations the updates will be numerous and > >> nearly empty. > >> > >> On Apr 17, 2015, at 7:58 PM, Andrew Musselman < > andrew.mussel...@gmail.com <javascript:;> > >> <javascript:;>> wrote: > >> > >> I have not implemented it for recommendations but a layered cache/sieve > >> structure could be useful. > >> > >> That is, between batch refreshes you can keep tacking on new updates in > a > >> cascading order so values that are updated exist in the newest layer but > >> otherwise the lookup goes for the latest updated layer. > >> > >> You can put a fractional multiplier on older layers for aging but again > >> I've not implemented it. > >> > >> On Friday, April 17, 2015, Ted Dunning <ted.dunn...@gmail.com > <javascript:;> > >> <javascript:;>> wrote: > >> > >>> > >>> Yes. Also add the fact that the nano batches are bounded tightly in > size > >>> both max and mean. And mostly filtered away anyway. > >>> > >>> Aging is an open question. I have never seen any effect of alternative > >>> sampling so I would just assume "keep oldest" which just tosses more > >>> samples. Then occasionally rebuild from batch if you really want aging > to > >>> go right. > >>> > >>> Search updates any more are true realtime also so that works very well. > >>> > >>> Sent from my iPhone > >>> > >>>> On Apr 17, 2015, at 17:20, Pat Ferrel <p...@occamsmachete.com > <javascript:;> > >> <javascript:;> > >>> <javascript:;>> wrote: > >>>> > >>>> Thanks. > >>>> > >>>> This idea is based on a micro-batch of interactions per update, not > >>> individual ones unless I missed something. That matches the typical > input > >>> flow. Most interactions are filtered away by frequency and number of > >>> interaction cuts. > >>>> > >>>> A couple practical issues > >>>> > >>>> In practice won’t this require aging of interactions too? So wouldn’t > >>> the update require some old interaction removal? I suppose this might > >> just > >>> take the form of added null interactions representing the geriatric > ones? > >>> Haven’t gone through the math with enough detail to see if you’ve > already > >>> accounted for this. > >>>> > >>>> To use actual math (self-join, etc.) we still need to alter the > geometry > >>> of the interactions to have the same row rank as the adjusted total. In > >>> other words the number of rows in all resulting interactions must be > the > >>> same. Over time this means completely removing rows and columns or > >> allowing > >>> empty rows in potentially all input matrices. > >>>> > >>>> Might not be too bad to accumulate gaps in rows and columns. Not sure > if > >>> it would have a practical impact (to some large limit) as long as it > was > >>> done, to keep the real size more or less fixed. > >>>> > >>>> As to realtime, that would be under search engine control through > >>> incremental indexing and there are a couple ways to do that, not a > >> problem > >>> afaik. As you point out the query always works and is real time. The > >> index > >>> update must be frequent and not impact the engine's availability for > >>> queries. > >>>> > >>>> On Apr 17, 2015, at 2:46 PM, Ted Dunning <ted.dunn...@gmail.com > <javascript:;> > >> <javascript:;> > >>> <javascript:;>> wrote: > >>>> > >>>> > >>>> When I think of real-time adaptation of indicators, I think of this: > >> > http://www.slideshare.net/tdunning/realtime-puppies-and-ponies-evolving-indicator-recommendations-in-realtime > >>>> > >>>> > >>>>> On Fri, Apr 17, 2015 at 6:51 PM, Pat Ferrel <p...@occamsmachete.com > <javascript:;> > >> <javascript:;> > >>> <javascript:;>> wrote: > >>>>> I’ve been thinking about Streaming (continuous input) and incremental > >>> coccurrence. > >>>>> > >>>>> As interactions stream in from the user it it fairly simple to use > >>> something like Spark streaming to maintain a moving time window for all > >>> input, and an update frequency that recalcs all input currently in the > >> time > >>> window. I’ve done this with the current cooccurrence code but though > >>> streaming, this is not incremental. > >>>>> > >>>>> The current data flow goes from interaction input to geometry and > user > >>> dictionary reconciliation to A’A, A’B etc. After the multiply the > >> resulting > >>> cooccurrence matrices are LLR weighted/filtered/down-sampled. > >>>>> > >>>>> Incremental can mean all sorts of things and may imply different > >>> trade-offs. Did you have anything specific in mind? > >> > >> >