Re: Streaming and incremental cooccurrence

Ted Dunning Sat, 18 Apr 2015 10:49:01 -0700

Andrew 

Take a look at the slides I posted.  In them I showed that the update does not 
grow beyond a very reasonable bound.


Sent from my iPhone

> On Apr 18, 2015, at 9:15, Andrew Musselman <andrew.mussel...@gmail.com> wrote:
> 
> Yes that's what I mean; if the number of updates gets too big it probably
> would be unmanageable though.  This approach worked well with daily
> updates, but never tried it with anything "real time."
> 
>> On Saturday, April 18, 2015, Pat Ferrel <p...@occamsmachete.com> wrote:
>> 
>> I think you are saying that instead of val newHashMap = lastHashMap ++
>> updateHashMap, layered updates might be useful since new and last are
>> potentially large. Some limit of updates might trigger a refresh. This
>> might work if the update works with incremental index updates in the search
>> engine. Given practical considerations the updates will be numerous and
>> nearly empty.
>> 
>> On Apr 17, 2015, at 7:58 PM, Andrew Musselman <andrew.mussel...@gmail.com
>> <javascript:;>> wrote:
>> 
>> I have not implemented it for recommendations but a layered cache/sieve
>> structure could be useful.
>> 
>> That is, between batch refreshes you can keep tacking on new updates in a
>> cascading order so values that are updated exist in the newest layer but
>> otherwise the lookup goes for the latest updated layer.
>> 
>> You can put a fractional multiplier on older layers for aging but again
>> I've not implemented it.
>> 
>> On Friday, April 17, 2015, Ted Dunning <ted.dunn...@gmail.com
>> <javascript:;>> wrote:
>> 
>>> 
>>> Yes. Also add the fact that the nano batches are bounded tightly in size
>>> both max and mean. And mostly filtered away anyway.
>>> 
>>> Aging is an open question. I have never seen any effect of alternative
>>> sampling so I would just assume "keep oldest" which just tosses more
>>> samples. Then occasionally rebuild from batch if you really want aging to
>>> go right.
>>> 
>>> Search updates any more are true realtime also so that works very well.
>>> 
>>> Sent from my iPhone
>>> 
>>>> On Apr 17, 2015, at 17:20, Pat Ferrel <p...@occamsmachete.com
>> <javascript:;>
>>> <javascript:;>> wrote:
>>>> 
>>>> Thanks.
>>>> 
>>>> This idea is based on a micro-batch of interactions per update, not
>>> individual ones unless I missed something. That matches the typical input
>>> flow. Most interactions are filtered away by  frequency and number of
>>> interaction cuts.
>>>> 
>>>> A couple practical issues
>>>> 
>>>> In practice won’t this require aging of interactions too? So wouldn’t
>>> the update require some old interaction removal? I suppose this might
>> just
>>> take the form of added null interactions representing the geriatric ones?
>>> Haven’t gone through the math with enough detail to see if you’ve already
>>> accounted for this.
>>>> 
>>>> To use actual math (self-join, etc.) we still need to alter the geometry
>>> of the interactions to have the same row rank as the adjusted total. In
>>> other words the number of rows in all resulting interactions must be the
>>> same. Over time this means completely removing rows and columns or
>> allowing
>>> empty rows in potentially all input matrices.
>>>> 
>>>> Might not be too bad to accumulate gaps in rows and columns. Not sure if
>>> it would have a practical impact (to some large limit) as long as it was
>>> done, to keep the real size more or less fixed.
>>>> 
>>>> As to realtime, that would be under search engine control through
>>> incremental indexing and there are a couple ways to do that, not a
>> problem
>>> afaik. As you point out the query always works and is real time. The
>> index
>>> update must be frequent and not impact the engine's availability for
>>> queries.
>>>> 
>>>> On Apr 17, 2015, at 2:46 PM, Ted Dunning <ted.dunn...@gmail.com
>> <javascript:;>
>>> <javascript:;>> wrote:
>>>> 
>>>> 
>>>> When I think of real-time adaptation of indicators, I think of this:
>> http://www.slideshare.net/tdunning/realtime-puppies-and-ponies-evolving-indicator-recommendations-in-realtime
>>>> 
>>>> 
>>>>> On Fri, Apr 17, 2015 at 6:51 PM, Pat Ferrel <p...@occamsmachete.com
>> <javascript:;>
>>> <javascript:;>> wrote:
>>>>> I’ve been thinking about Streaming (continuous input) and incremental
>>> coccurrence.
>>>>> 
>>>>> As interactions stream in from the user it it fairly simple to use
>>> something like Spark streaming to maintain a moving time window for all
>>> input, and an update frequency that recalcs all input currently in the
>> time
>>> window. I’ve done this with the current cooccurrence code but though
>>> streaming, this is not incremental.
>>>>> 
>>>>> The current data flow goes from interaction input to geometry and user
>>> dictionary reconciliation to A’A, A’B etc. After the multiply the
>> resulting
>>> cooccurrence matrices are LLR weighted/filtered/down-sampled.
>>>>> 
>>>>> Incremental can mean all sorts of things and may imply different
>>> trade-offs. Did you have anything specific in mind?
>> 
>>

Re: Streaming and incremental cooccurrence

Reply via email to