Andrew Take a look at the slides I posted. In them I showed that the update does not grow beyond a very reasonable bound.
Sent from my iPhone > On Apr 18, 2015, at 9:15, Andrew Musselman <andrew.mussel...@gmail.com> wrote: > > Yes that's what I mean; if the number of updates gets too big it probably > would be unmanageable though. This approach worked well with daily > updates, but never tried it with anything "real time." > >> On Saturday, April 18, 2015, Pat Ferrel <p...@occamsmachete.com> wrote: >> >> I think you are saying that instead of val newHashMap = lastHashMap ++ >> updateHashMap, layered updates might be useful since new and last are >> potentially large. Some limit of updates might trigger a refresh. This >> might work if the update works with incremental index updates in the search >> engine. Given practical considerations the updates will be numerous and >> nearly empty. >> >> On Apr 17, 2015, at 7:58 PM, Andrew Musselman <andrew.mussel...@gmail.com >> <javascript:;>> wrote: >> >> I have not implemented it for recommendations but a layered cache/sieve >> structure could be useful. >> >> That is, between batch refreshes you can keep tacking on new updates in a >> cascading order so values that are updated exist in the newest layer but >> otherwise the lookup goes for the latest updated layer. >> >> You can put a fractional multiplier on older layers for aging but again >> I've not implemented it. >> >> On Friday, April 17, 2015, Ted Dunning <ted.dunn...@gmail.com >> <javascript:;>> wrote: >> >>> >>> Yes. Also add the fact that the nano batches are bounded tightly in size >>> both max and mean. And mostly filtered away anyway. >>> >>> Aging is an open question. I have never seen any effect of alternative >>> sampling so I would just assume "keep oldest" which just tosses more >>> samples. Then occasionally rebuild from batch if you really want aging to >>> go right. >>> >>> Search updates any more are true realtime also so that works very well. >>> >>> Sent from my iPhone >>> >>>> On Apr 17, 2015, at 17:20, Pat Ferrel <p...@occamsmachete.com >> <javascript:;> >>> <javascript:;>> wrote: >>>> >>>> Thanks. >>>> >>>> This idea is based on a micro-batch of interactions per update, not >>> individual ones unless I missed something. That matches the typical input >>> flow. Most interactions are filtered away by frequency and number of >>> interaction cuts. >>>> >>>> A couple practical issues >>>> >>>> In practice won’t this require aging of interactions too? So wouldn’t >>> the update require some old interaction removal? I suppose this might >> just >>> take the form of added null interactions representing the geriatric ones? >>> Haven’t gone through the math with enough detail to see if you’ve already >>> accounted for this. >>>> >>>> To use actual math (self-join, etc.) we still need to alter the geometry >>> of the interactions to have the same row rank as the adjusted total. In >>> other words the number of rows in all resulting interactions must be the >>> same. Over time this means completely removing rows and columns or >> allowing >>> empty rows in potentially all input matrices. >>>> >>>> Might not be too bad to accumulate gaps in rows and columns. Not sure if >>> it would have a practical impact (to some large limit) as long as it was >>> done, to keep the real size more or less fixed. >>>> >>>> As to realtime, that would be under search engine control through >>> incremental indexing and there are a couple ways to do that, not a >> problem >>> afaik. As you point out the query always works and is real time. The >> index >>> update must be frequent and not impact the engine's availability for >>> queries. >>>> >>>> On Apr 17, 2015, at 2:46 PM, Ted Dunning <ted.dunn...@gmail.com >> <javascript:;> >>> <javascript:;>> wrote: >>>> >>>> >>>> When I think of real-time adaptation of indicators, I think of this: >> http://www.slideshare.net/tdunning/realtime-puppies-and-ponies-evolving-indicator-recommendations-in-realtime >>>> >>>> >>>>> On Fri, Apr 17, 2015 at 6:51 PM, Pat Ferrel <p...@occamsmachete.com >> <javascript:;> >>> <javascript:;>> wrote: >>>>> I’ve been thinking about Streaming (continuous input) and incremental >>> coccurrence. >>>>> >>>>> As interactions stream in from the user it it fairly simple to use >>> something like Spark streaming to maintain a moving time window for all >>> input, and an update frequency that recalcs all input currently in the >> time >>> window. I’ve done this with the current cooccurrence code but though >>> streaming, this is not incremental. >>>>> >>>>> The current data flow goes from interaction input to geometry and user >>> dictionary reconciliation to A’A, A’B etc. After the multiply the >> resulting >>> cooccurrence matrices are LLR weighted/filtered/down-sampled. >>>>> >>>>> Incremental can mean all sorts of things and may imply different >>> trade-offs. Did you have anything specific in mind? >> >>