[
https://issues.apache.org/jira/browse/FLINK-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vasia Kalavri resolved FLINK-2452.
----------------------------------
Resolution: Fixed
> Add a playcount threshold to the MusicProfiles example
> ------------------------------------------------------
>
> Key: FLINK-2452
> URL: https://issues.apache.org/jira/browse/FLINK-2452
> Project: Flink
> Issue Type: Improvement
> Components: Gelly
> Affects Versions: 0.10
> Reporter: Vasia Kalavri
> Assignee: Vasia Kalavri
> Priority: Minor
> Fix For: 0.10
>
>
> In the MusicProfiles example, when creating the user-user similarity graph,
> an edge is created between any 2 users that have listened to the same song
> (even if once). Depending on the input data, this might produce a projection
> graph with many more edges than the original user-song graph.
> To make this computation more efficient, this issue proposes adding a
> user-defined parameter that filters out songs that a user has listened to
> only a few times. Essentially, it is a threshold for playcount, above which a
> user is considered to like a song.
> For reference, with a threshold value of 30, the whole Last.fm dataset is
> analyzed on my laptop in a few minutes, while no threshold results in a
> runtime of several hours.
> There are many solutions to this problem, but since this is just an example
> (not a library method), I think that keeping it simple is important.
> Thanks to [~andralungu] for spotting the inefficiency!
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)