I think removing the User and Item abstractions would be a good idea.
The User interface is a bit more complex with the getPreferences
methods, but I think it can be easily ported to the DataModel. There
will be some impact in the already written code, but I think the
benefits are interesting.
I dont know if removing the Preference abstraction will bring a better
performance. The getPreferences methods are very useful to iterate
over the preferences of users and items, and I think it save a lot of
lookups if the association user/item is present in a single object.
André
Citando Sean Owen <[email protected]>:
I am considering a somewhat large change to org.apache.mahout.cf.taste code
and would like to solicit feedback from users.
The change would be to remove the User, Item and Preference
interfaces/abstractions from the code. Everything would proceed in terms of
user and item IDs, and preference values instead.
The reasons for these interfaces originally were, well, it seemed nice. It
also provided a way for implementors to substitute domain-specific
implementations with additional information.
But there are problems too.
- Do methods take a User, or user ID? The code is not consistent in this
regard. If User, the caller is forced to look up a User if it only has an
ID. (Conversely, if the caller already has a User, and the method needs a
User, then passing an ID only forces a redundant lookup. I think this is
rarer.)
- Factory method problem. There are many points in the code where it should
call to factory methods to generate a User/Item/Preference object since the
domain may use specialized implementations instead of GenericUser, etc. At
the moment some methods just assume GenericUser, etc. Fixing this would be a
bit hard but would more importantly impact performance I think.
- Object overhead. Holding these extra objects has a cost in memory and
performance.
The code already really assumes there are nothing but user and item IDs and
a pref value. So why not make the core reflect this and gain some simplicity
and speed performance?
I think that domains that need to inject extra information can still do this
fine without needing custom User, Item implementations.
It is just a thought now. Anybody have more?
Sean