[
https://issues.apache.org/jira/browse/MAHOUT-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13930646#comment-13930646
]
Adam Ilardi commented on MAHOUT-1447:
-------------------------------------
I have code that creates the ALS model and outputs it to the .out binary file
of the Factorization
I have a different class that reads that output and does cross validation but
it's a ton of data so I don't want to keep a duplicate array of item ids in
memory.
Scala code example ... sorry
val model = new FilePersistenceStrategy(new File(modelFile))
val factorization = model.load()
val estimator: TopItems.Estimator[java.lang.Long] = new
DotProductEstimator(userId, factorization)
This way I can do
TopItems.getTopItems(factorization.numItems, factorization.getItemIdIterator(),
null, estimator))
VS
val allItems =
factorization.getItemIDMappings().iterator().asScala.map(_.getKey().toLong).toArray
TopItems.getTopItems(factorization.numItems, new
LongPrimitiveArrayIterator(allItems), null, estimator))
Alternatively adding getters on these two Factorization class members would be
best. Are they hidden to prevent map modifications? If so they could be be
wrapped in Collections.unmodifiableMap
private final FastByIDMap<Integer> userIDMapping;
private final FastByIDMap<Integer> itemIDMapping;
Please let me know if I wasn't clear.
> ImplicitFeedbackAlternatingLeastSquaresSolver tests and features
> ----------------------------------------------------------------
>
> Key: MAHOUT-1447
> URL: https://issues.apache.org/jira/browse/MAHOUT-1447
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.9
> Reporter: Adam Ilardi
> Assignee: Sebastian Schelter
> Priority: Minor
> Labels: newbie, patch, performance
> Fix For: 1.0
>
> Attachments: MAHOUT-1447.patch
>
>
> I added a test case for the YtY calculation code
> I removed the indexes.quickSort() in the YtY calculation because I don't
> think it's necessary and the test cases passed without it. The order
> shouldn't matter since you're adding the scalers together. Correct me if i'm
> wrong.
> In Factorization.java I added methods to access the iterator of item ids and
> user ids directly. This saves memory when using classes like TopItems.java
> when you don't have the DataModel class in memory as well.
--
This message was sent by Atlassian JIRA
(v6.2#6252)