[ 
https://issues.apache.org/jira/browse/MAHOUT-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13930646#comment-13930646
 ] 

Adam Ilardi commented on MAHOUT-1447:
-------------------------------------

I have code that creates the ALS model and outputs it to the .out binary file 
of the Factorization 
I have a different class that reads that output and does cross validation but 
it's a ton of data so I don't want to keep a duplicate array of item ids in 
memory.
Scala code example ... sorry

val model = new FilePersistenceStrategy(new File(modelFile))
val factorization = model.load()
val estimator: TopItems.Estimator[java.lang.Long] = new 
DotProductEstimator(userId, factorization)       

This way I can do
TopItems.getTopItems(factorization.numItems, factorization.getItemIdIterator(), 
null, estimator))

VS

val allItems = 
factorization.getItemIDMappings().iterator().asScala.map(_.getKey().toLong).toArray
TopItems.getTopItems(factorization.numItems, new 
LongPrimitiveArrayIterator(allItems), null, estimator))

Alternatively adding getters on these two Factorization class members would be 
best. Are they hidden to prevent map modifications? If so they could be be 
wrapped in Collections.unmodifiableMap 

private final FastByIDMap<Integer> userIDMapping;
private final FastByIDMap<Integer> itemIDMapping;

Please let me know if I wasn't clear.

> ImplicitFeedbackAlternatingLeastSquaresSolver tests and features
> ----------------------------------------------------------------
>
>                 Key: MAHOUT-1447
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1447
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.9
>            Reporter: Adam Ilardi
>            Assignee: Sebastian Schelter
>            Priority: Minor
>              Labels: newbie, patch, performance
>             Fix For: 1.0
>
>         Attachments: MAHOUT-1447.patch
>
>
> I added a test case for the YtY calculation code
> I removed the indexes.quickSort() in the YtY calculation  because I don't 
> think it's necessary and the test cases passed without it. The order 
> shouldn't matter since you're adding the scalers together. Correct me if i'm 
> wrong.
> In Factorization.java I added methods to access the iterator of item ids and 
> user ids directly. This saves memory when using classes like TopItems.java 
> when you don't have the DataModel class in memory as well. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to