I try to model a music charts system to get familiar with Samza.
Charts are defined by the top N entries with highest count of a map
from unique track ID, basically a song, to counter, basically the
amount of plays of this entity, during a sliding time-window.

The problem I see is that of an evergrowing size of this map as the ID
space of tracks can be quite large (let's pick 2E6). Not all of these
IDs will be played (thus should be counted) within a given time-window
(let's pick 1 hour) but it's not obvious to me when to prune the map
during this sliding time-window.

I assume dealing with sliding time-windows is a common case for stream
processing thus some useful API provided by Samza. Does an example or
tutorial for this kind of sliding time-window counting example exist?

Reply via email to