Hello there! I'm following the samples @ mathou in action. So I created the sample based on the movielens 10M dataset using the filedatamodel, and genericitembasedrecommender.
Well, so I add a new file to directory named ratings.1.dat containing a new user. Calling refresh takes a long time. On the machine I'm testing the file loading and user processing takes less than 10 seconds. Adding a new file, and calling refresh(null) as instructed on the book. Well, it took quite a long time (the new file had only 3 entries, and whole process around 25 seconds, twice as before) I know this is one huge file :) but, on the book says that only new entries are reprocessed, but according to this log: 04/02/10 12:18:21:021 BRST] INFO common.RefreshHelper: Added refreshable: FileDataModel[dataFile:/home/vinicius/Documents/logs/ratings.dat] [04/02/10 12:18:24:024 BRST] DEBUG file.FileDataModel: File has changed; reloading... [04/02/10 12:18:24:024 BRST] INFO file.FileDataModel: Reading file info... [04/02/10 12:18:26:026 BRST] INFO file.FileDataModel: Processed 1000000 lines [04/02/10 12:18:28:028 BRST] INFO file.FileDataModel: Processed 2000000 lines [04/02/10 12:18:29:029 BRST] INFO file.FileDataModel: Processed 3000000 lines [04/02/10 12:18:30:030 BRST] INFO file.FileDataModel: Processed 4000000 lines [04/02/10 12:18:31:031 BRST] INFO file.FileDataModel: Processed 5000000 lines [04/02/10 12:18:33:033 BRST] INFO file.FileDataModel: Processed 6000000 lines [04/02/10 12:18:34:034 BRST] INFO file.FileDataModel: Processed 7000000 lines [04/02/10 12:18:35:035 BRST] INFO file.FileDataModel: Processed 8000000 lines [04/02/10 12:18:37:037 BRST] INFO file.FileDataModel: Processed 9000000 lines [04/02/10 12:18:39:039 BRST] INFO file.FileDataModel: Processed 10000000 lines [04/02/10 12:18:39:039 BRST] INFO file.FileDataModel: Read lines: 10000054 [04/02/10 12:18:39:039 BRST] INFO file.FileDataModel: Reading file info... [04/02/10 12:18:39:039 BRST] INFO file.FileDataModel: Read lines: 3 [04/02/10 12:18:39:039 BRST] INFO model.GenericDataModel: Processed 10000 users [04/02/10 12:18:40:040 BRST] INFO model.GenericDataModel: Processed 20000 users [04/02/10 12:18:40:040 BRST] INFO model.GenericDataModel: Processed 30000 users [04/02/10 12:18:41:041 BRST] INFO model.GenericDataModel: Processed 40000 users [04/02/10 12:18:42:042 BRST] INFO model.GenericDataModel: Processed 50000 users [04/02/10 12:18:45:045 BRST] INFO model.GenericDataModel: Processed 60000 users [04/02/10 12:18:45:045 BRST] INFO model.GenericDataModel: Processed 69879 users [04/02/10 12:18:46:046 BRST] INFO common.RefreshHelper: Refreshed: [FileDataModel[dataFile:/home/vinicius/Documents/logs/ratings.dat]] It seems that whole ratings.dat is re-read again. Is GenericItemBasedRecommender needs to refresh the entire file? Is it possible to speed up things? Regards -- The intuitive mind is a sacred gift and the rational mind is a faithful servant. We have created a society that honors the servant and has forgotten the gift.
