I try to model a music charts system to get familiar with Samza. Charts are defined by the top N entries with highest count of a map from unique track ID, basically a song, to counter, basically the amount of plays of this entity, during a sliding time-window.
The problem I see is that of an evergrowing size of this map as the ID space of tracks can be quite large (let's pick 2E6). Not all of these IDs will be played (thus should be counted) within a given time-window (let's pick 1 hour) but it's not obvious to me when to prune the map during this sliding time-window. I assume dealing with sliding time-windows is a common case for stream processing thus some useful API provided by Samza. Does an example or tutorial for this kind of sliding time-window counting example exist?