I am considering a somewhat large change to org.apache.mahout.cf.taste code
and would like to solicit feedback from users.

The change would be to remove the User, Item and Preference
interfaces/abstractions from the code. Everything would proceed in terms of
user and item IDs, and preference values instead.

The reasons for these interfaces originally were, well, it seemed nice. It
also provided a way for implementors to substitute domain-specific
implementations with additional information.

But there are problems too.

- Do methods take a User, or user ID? The code is not consistent in this
regard. If User, the caller is forced to look up a User if it only has an
ID. (Conversely, if the caller already has a User, and the method needs a
User, then passing an ID only forces a redundant lookup. I think this is
rarer.)

- Factory method problem. There are many points in the code where it should
call to factory methods to generate a User/Item/Preference object since the
domain may use specialized implementations instead of GenericUser, etc. At
the moment some methods just assume GenericUser, etc. Fixing this would be a
bit hard but would more importantly impact performance I think.

- Object overhead. Holding these extra objects has a cost in memory and
performance.

The code already really assumes there are nothing but user and item IDs and
a pref value. So why not make the core reflect this and gain some simplicity
and speed performance?

I think that domains that need to inject extra information can still do this
fine without needing custom User, Item implementations.

It is just a thought now. Anybody have more?

Sean

Reply via email to