You are completely right, the simple interface would only be usable for
readonly / batch-updatable recommenders. Online recommenders might need
something different. I tried to widen the discussion here to discuss all
kinds of API changes in the recommenders that would be necessary in the
future.



2013/7/17 Peng Cheng <pc...@uowmail.edu.au>

> One thing that suddenly comes to my mind is that, for a simple interface
> like FactorizablePreferences, maybe sequential READ in real time is
> possible, but sequential WRITE in O(1) time is Utopia. Because you need to
> flush out old preference with same user and item ID (in worst case it could
> be an interpolation search), otherwise you are permitting a user rating an
> item twice with different values. Considering how FileDataModel suppose to
> work (new files flush old files), maybe using the simple interface has less
> advantages than we used to believe.
>
>
> On 13-07-17 04:58 PM, Sebastian Schelter wrote:
>
>> Hi Peng,
>>
>> I never wanted to discard the old interface, I just wanted to split it up.
>> I want to have a simple interface that only supports sequential access
>> (and
>> allows for very memory efficient implementions, e.g. by the use of
>> primitive arrays). DataModel should *extend* this interface and provide
>> sequential and random access (basically what is already does).
>>
>> Than a recommender such as SGD could state that it only needs sequential
>> access to the preferences and you can either feed it a DataModel (so we
>> don"t break backwards compatibility) or a memory efficient sequential
>> access thingy.
>>
>> Does that make sense for you?
>>
>>
>> 2013/7/17 Peng Cheng <pc...@uowmail.edu.au>
>>
>>  I see, OK so we shouldn't use the old implementation. But I mean, the old
>>> interface doesn't have to be discarded. The discrepancy between your
>>> FactorizablePreferences and DataModel is that, your model supports
>>> getPreferences(), which returns all preferences as an iterator, and
>>> DataModel supports a few old functions that returns preferences for an
>>> individual user or item.
>>>
>>> My point is that, it is not hard for each of them to implement what they
>>> lack of: old DataModel can implement getPreferences() just by a a loop in
>>> abstract class. Your new FactorizablePreferences can implement those old
>>> functions by a binary search that takes O(log n) time, or an
>>> interpolation
>>> search that takes O(log log n) time in average. So does the online
>>> update.
>>> It will just be a matter of different speed and space, but not different
>>> interface standard, we can use old unit tests, old examples, old
>>> everything. And we will be more flexible in writing ensemble recommender.
>>>
>>> Just a few thoughts, I'll have to validate the idea first before creating
>>> a new JIRA ticket.
>>>
>>> Yours Peng
>>>
>>>
>>>
>>> On 13-07-16 02:51 PM, Sebastian Schelter wrote:
>>>
>>>  I completely agree, Netflix is less than one gigabye in a smart
>>>> representation, 12x more memory is a nogo. The techniques used in
>>>> FactorizablePreferences allow a much more memory efficient
>>>> representation,
>>>> tested on KDD Music dataset which is approx 2.5 times Netflix and fits
>>>> into
>>>> 3GB with that approach.
>>>>
>>>>
>>>> 2013/7/16 Ted Dunning <ted.dunn...@gmail.com>
>>>>
>>>>   Netflix is a small dataset.  12G for that seems quite excessive.
>>>>
>>>>> Note also that this is before you have done any work.
>>>>>
>>>>> Ideally, 100million observations should take << 1GB.
>>>>>
>>>>> On Tue, Jul 16, 2013 at 8:19 AM, Peng Cheng <pc...@uowmail.edu.au>
>>>>> wrote:
>>>>>
>>>>>   The second idea is indeed splendid, we should separate
>>>>> time-complexity
>>>>>
>>>>>> first and space-complexity first implementation. What I'm not quite
>>>>>> sure,
>>>>>> is that if we really need to create two interfaces instead of one.
>>>>>> Personally, I think 12G heap space is not that high right? Most new
>>>>>>
>>>>>>  laptop
>>>>>
>>>>>  can already handle that (emphasis on laptop). And if we replace hash
>>>>>> map
>>>>>> (the culprit of high memory consumption) with list/linkedList, it
>>>>>> would
>>>>>> simply degrade time complexity for a linear search to O(n), not too
>>>>>> bad
>>>>>> either. The current DataModel is a result of careful thoughts and has
>>>>>> underwent extensive test, it is easier to expand on top of it instead
>>>>>> of
>>>>>> subverting it.
>>>>>>
>>>>>>
>>>
>
>

Reply via email to