Re: Streaming and incremental cooccurrence

Andrew Musselman Fri, 17 Apr 2015 19:59:36 -0700

I have not implemented it for recommendations but a layered cache/sieve
structure could be useful.


That is, between batch refreshes you can keep tacking on new updates in a
cascading order so values that are updated exist in the newest layer but
otherwise the lookup goes for the latest updated layer.

You can put a fractional multiplier on older layers for aging but again
I've not implemented it.

On Friday, April 17, 2015, Ted Dunning <ted.dunn...@gmail.com> wrote:

>
> Yes. Also add the fact that the nano batches are bounded tightly in size
> both max and mean. And mostly filtered away anyway.
>
> Aging is an open question. I have never seen any effect of alternative
> sampling so I would just assume "keep oldest" which just tosses more
> samples. Then occasionally rebuild from batch if you really want aging to
> go right.
>
> Search updates any more are true realtime also so that works very well.
>
> Sent from my iPhone
>
> > On Apr 17, 2015, at 17:20, Pat Ferrel <p...@occamsmachete.com
> <javascript:;>> wrote:
> >
> > Thanks.
> >
> > This idea is based on a micro-batch of interactions per update, not
> individual ones unless I missed something. That matches the typical input
> flow. Most interactions are filtered away by  frequency and number of
> interaction cuts.
> >
> > A couple practical issues
> >
> > In practice won’t this require aging of interactions too? So wouldn’t
> the update require some old interaction removal? I suppose this might just
> take the form of added null interactions representing the geriatric ones?
> Haven’t gone through the math with enough detail to see if you’ve already
> accounted for this.
> >
> > To use actual math (self-join, etc.) we still need to alter the geometry
> of the interactions to have the same row rank as the adjusted total. In
> other words the number of rows in all resulting interactions must be the
> same. Over time this means completely removing rows and columns or allowing
> empty rows in potentially all input matrices.
> >
> > Might not be too bad to accumulate gaps in rows and columns. Not sure if
> it would have a practical impact (to some large limit) as long as it was
> done, to keep the real size more or less fixed.
> >
> > As to realtime, that would be under search engine control through
> incremental indexing and there are a couple ways to do that, not a problem
> afaik. As you point out the query always works and is real time. The index
> update must be frequent and not impact the engine's availability for
> queries.
> >
> > On Apr 17, 2015, at 2:46 PM, Ted Dunning <ted.dunn...@gmail.com
> <javascript:;>> wrote:
> >
> >
> > When I think of real-time adaptation of indicators, I think of this:
> >
> >
> http://www.slideshare.net/tdunning/realtime-puppies-and-ponies-evolving-indicator-recommendations-in-realtime
> >
> >
> >> On Fri, Apr 17, 2015 at 6:51 PM, Pat Ferrel <p...@occamsmachete.com
> <javascript:;>> wrote:
> >> I’ve been thinking about Streaming (continuous input) and incremental
> coccurrence.
> >>
> >> As interactions stream in from the user it it fairly simple to use
> something like Spark streaming to maintain a moving time window for all
> input, and an update frequency that recalcs all input currently in the time
> window. I’ve done this with the current cooccurrence code but though
> streaming, this is not incremental.
> >>
> >> The current data flow goes from interaction input to geometry and user
> dictionary reconciliation to A’A, A’B etc. After the multiply the resulting
> cooccurrence matrices are LLR weighted/filtered/down-sampled.
> >>
> >> Incremental can mean all sorts of things and may imply different
> trade-offs. Did you have anything specific in mind?
> >
> >
>

Re: Streaming and incremental cooccurrence

Reply via email to