Thanks. 

This idea is based on a micro-batch of interactions per update, not individual 
ones unless I missed something. That matches the typical input flow. Most 
interactions are filtered away by  frequency and number of interaction cuts.

A couple practical issues

In practice won’t this require aging of interactions too? So wouldn’t the 
update require some old interaction removal? I suppose this might just take the 
form of added null interactions representing the geriatric ones? Haven’t gone 
through the math with enough detail to see if you’ve already accounted for this.

To use actual math (self-join, etc.) we still need to alter the geometry of the 
interactions to have the same row rank as the adjusted total. In other words 
the number of rows in all resulting interactions must be the same. Over time 
this means completely removing rows and columns or allowing empty rows in 
potentially all input matrices.

Might not be too bad to accumulate gaps in rows and columns. Not sure if it 
would have a practical impact (to some large limit) as long as it was done, to 
keep the real size more or less fixed.

As to realtime, that would be under search engine control through incremental 
indexing and there are a couple ways to do that, not a problem afaik. As you 
point out the query always works and is real time. The index update must be 
frequent and not impact the engine's availability for queries.

On Apr 17, 2015, at 2:46 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:


When I think of real-time adaptation of indicators, I think of this:

http://www.slideshare.net/tdunning/realtime-puppies-and-ponies-evolving-indicator-recommendations-in-realtime
 
<http://www.slideshare.net/tdunning/realtime-puppies-and-ponies-evolving-indicator-recommendations-in-realtime>


On Fri, Apr 17, 2015 at 6:51 PM, Pat Ferrel <p...@occamsmachete.com 
<mailto:p...@occamsmachete.com>> wrote:
I’ve been thinking about Streaming (continuous input) and incremental 
coccurrence.

As interactions stream in from the user it it fairly simple to use something 
like Spark streaming to maintain a moving time window for all input, and an 
update frequency that recalcs all input currently in the time window. I’ve done 
this with the current cooccurrence code but though streaming, this is not 
incremental.

The current data flow goes from interaction input to geometry and user 
dictionary reconciliation to A’A, A’B etc. After the multiply the resulting 
cooccurrence matrices are LLR weighted/filtered/down-sampled.

Incremental can mean all sorts of things and may imply different trade-offs. 
Did you have anything specific in mind?


Reply via email to