Re: Streaming and incremental cooccurrence

Pat Ferrel Fri, 17 Apr 2015 17:22:19 -0700

Thanks. 

This idea is based on a micro-batch of interactions per update, not individual 
ones unless I missed something. That matches the typical input flow. Most 
interactions are filtered away by  frequency and number of interaction cuts.

A couple practical issues

In practice won’t this require aging of interactions too? So wouldn’t the
update require some old interaction removal? I suppose this might just take the
form of added null interactions representing the geriatric ones? Haven’t gone
through the math with enough detail to see if you’ve already accounted for this.

To use actual math (self-join, etc.) we still need to alter the geometry of the
interactions to have the same row rank as the adjusted total. In other words
the number of rows in all resulting interactions must be the same. Over time
this means completely removing rows and columns or allowing empty rows in
potentially all input matrices.

Might not be too bad to accumulate gaps in rows and columns. Not sure if it
would have a practical impact (to some large limit) as long as it was done, to
keep the real size more or less fixed.

As to realtime, that would be under search engine control through incremental
indexing and there are a couple ways to do that, not a problem afaik. As you
point out the query always works and is real time. The index update must be
frequent and not impact the engine's availability for queries.

On Apr 17, 2015, at 2:46 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:

When I think of real-time adaptation of indicators, I think of this:

http://www.slideshare.net/tdunning/realtime-puppies-and-ponies-evolving-indicator-recommendations-in-realtime

<http://www.slideshare.net/tdunning/realtime-puppies-and-ponies-evolving-indicator-recommendations-in-realtime>

On Fri, Apr 17, 2015 at 6:51 PM, Pat Ferrel <p...@occamsmachete.com
<mailto:p...@occamsmachete.com>> wrote:
I’ve been thinking about Streaming (continuous input) and incremental
coccurrence.

As interactions stream in from the user it it fairly simple to use something
like Spark streaming to maintain a moving time window for all input, and an
update frequency that recalcs all input currently in the time window. I’ve done
this with the current cooccurrence code but though streaming, this is not
incremental.

The current data flow goes from interaction input to geometry and user
dictionary reconciliation to A’A, A’B etc. After the multiply the resulting
cooccurrence matrices are LLR weighted/filtered/down-sampled.

Incremental can mean all sorts of things and may imply different trade-offs.
Did you have anything specific in mind?

Re: Streaming and incremental cooccurrence

Reply via email to