Re: Streaming and incremental cooccurrence

Andrew Musselman Sat, 18 Apr 2015 11:24:25 -0700

Cool

On Saturday, April 18, 2015, Ted Dunning <ted.dunn...@gmail.com> wrote:


>
> Andrew
>
> Take a look at the slides I posted.  In them I showed that the update does
> not grow beyond a very reasonable bound.
>
> Sent from my iPhone
>
> > On Apr 18, 2015, at 9:15, Andrew Musselman <andrew.mussel...@gmail.com
> <javascript:;>> wrote:
> >
> > Yes that's what I mean; if the number of updates gets too big it probably
> > would be unmanageable though.  This approach worked well with daily
> > updates, but never tried it with anything "real time."
> >
> >> On Saturday, April 18, 2015, Pat Ferrel <p...@occamsmachete.com
> <javascript:;>> wrote:
> >>
> >> I think you are saying that instead of val newHashMap = lastHashMap ++
> >> updateHashMap, layered updates might be useful since new and last are
> >> potentially large. Some limit of updates might trigger a refresh. This
> >> might work if the update works with incremental index updates in the
> search
> >> engine. Given practical considerations the updates will be numerous and
> >> nearly empty.
> >>
> >> On Apr 17, 2015, at 7:58 PM, Andrew Musselman <
> andrew.mussel...@gmail.com <javascript:;>
> >> <javascript:;>> wrote:
> >>
> >> I have not implemented it for recommendations but a layered cache/sieve
> >> structure could be useful.
> >>
> >> That is, between batch refreshes you can keep tacking on new updates in
> a
> >> cascading order so values that are updated exist in the newest layer but
> >> otherwise the lookup goes for the latest updated layer.
> >>
> >> You can put a fractional multiplier on older layers for aging but again
> >> I've not implemented it.
> >>
> >> On Friday, April 17, 2015, Ted Dunning <ted.dunn...@gmail.com
> <javascript:;>
> >> <javascript:;>> wrote:
> >>
> >>>
> >>> Yes. Also add the fact that the nano batches are bounded tightly in
> size
> >>> both max and mean. And mostly filtered away anyway.
> >>>
> >>> Aging is an open question. I have never seen any effect of alternative
> >>> sampling so I would just assume "keep oldest" which just tosses more
> >>> samples. Then occasionally rebuild from batch if you really want aging
> to
> >>> go right.
> >>>
> >>> Search updates any more are true realtime also so that works very well.
> >>>
> >>> Sent from my iPhone
> >>>
> >>>> On Apr 17, 2015, at 17:20, Pat Ferrel <p...@occamsmachete.com
> <javascript:;>
> >> <javascript:;>
> >>> <javascript:;>> wrote:
> >>>>
> >>>> Thanks.
> >>>>
> >>>> This idea is based on a micro-batch of interactions per update, not
> >>> individual ones unless I missed something. That matches the typical
> input
> >>> flow. Most interactions are filtered away by  frequency and number of
> >>> interaction cuts.
> >>>>
> >>>> A couple practical issues
> >>>>
> >>>> In practice won’t this require aging of interactions too? So wouldn’t
> >>> the update require some old interaction removal? I suppose this might
> >> just
> >>> take the form of added null interactions representing the geriatric
> ones?
> >>> Haven’t gone through the math with enough detail to see if you’ve
> already
> >>> accounted for this.
> >>>>
> >>>> To use actual math (self-join, etc.) we still need to alter the
> geometry
> >>> of the interactions to have the same row rank as the adjusted total. In
> >>> other words the number of rows in all resulting interactions must be
> the
> >>> same. Over time this means completely removing rows and columns or
> >> allowing
> >>> empty rows in potentially all input matrices.
> >>>>
> >>>> Might not be too bad to accumulate gaps in rows and columns. Not sure
> if
> >>> it would have a practical impact (to some large limit) as long as it
> was
> >>> done, to keep the real size more or less fixed.
> >>>>
> >>>> As to realtime, that would be under search engine control through
> >>> incremental indexing and there are a couple ways to do that, not a
> >> problem
> >>> afaik. As you point out the query always works and is real time. The
> >> index
> >>> update must be frequent and not impact the engine's availability for
> >>> queries.
> >>>>
> >>>> On Apr 17, 2015, at 2:46 PM, Ted Dunning <ted.dunn...@gmail.com
> <javascript:;>
> >> <javascript:;>
> >>> <javascript:;>> wrote:
> >>>>
> >>>>
> >>>> When I think of real-time adaptation of indicators, I think of this:
> >>
> http://www.slideshare.net/tdunning/realtime-puppies-and-ponies-evolving-indicator-recommendations-in-realtime
> >>>>
> >>>>
> >>>>> On Fri, Apr 17, 2015 at 6:51 PM, Pat Ferrel <p...@occamsmachete.com
> <javascript:;>
> >> <javascript:;>
> >>> <javascript:;>> wrote:
> >>>>> I’ve been thinking about Streaming (continuous input) and incremental
> >>> coccurrence.
> >>>>>
> >>>>> As interactions stream in from the user it it fairly simple to use
> >>> something like Spark streaming to maintain a moving time window for all
> >>> input, and an update frequency that recalcs all input currently in the
> >> time
> >>> window. I’ve done this with the current cooccurrence code but though
> >>> streaming, this is not incremental.
> >>>>>
> >>>>> The current data flow goes from interaction input to geometry and
> user
> >>> dictionary reconciliation to A’A, A’B etc. After the multiply the
> >> resulting
> >>> cooccurrence matrices are LLR weighted/filtered/down-sampled.
> >>>>>
> >>>>> Incremental can mean all sorts of things and may imply different
> >>> trade-offs. Did you have anything specific in mind?
> >>
> >>
>

Re: Streaming and incremental cooccurrence

Reply via email to