[
https://issues.apache.org/jira/browse/MAHOUT-900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen updated MAHOUT-900:
-----------------------------
Description:
{code}
int currentSize = chosenTexts.size();
if (currentSize < k) {
chosenTexts.add(newText);
chosenClusters.add(newCluster);
} else if (random.nextInt(currentSize + 1) == 0) { // with chance
1/(currentSize+1) pick new element
int indexToRemove = random.nextInt(currentSize); // evict one
chosen randomly
chosenTexts.remove(indexToRemove);
chosenClusters.remove(indexToRemove);
chosenTexts.add(newText);
chosenClusters.add(newCluster);
}
{code}
The second "if" condition ought to be "!= 0", right? Only if it is 0 do we skip
the body, which removes an existing element, since the new element itself is
evicted.
Second, this code:
{code}
for (int i = 0; i < k; i++) {
writer.append(chosenTexts.get(i), chosenClusters.get(i));
}
{code}
... assumes that at least k elements existed in the input, and fails otherwise.
Probably need to cap this.
Patch attached.
was:
int currentSize = chosenTexts.size();
if (currentSize < k) {
chosenTexts.add(newText);
chosenClusters.add(newCluster);
} else if (random.nextInt(currentSize + 1) == 0) { // with chance
1/(currentSize+1) pick new element
int indexToRemove = random.nextInt(currentSize); // evict one
chosen randomly
chosenTexts.remove(indexToRemove);
chosenClusters.remove(indexToRemove);
chosenTexts.add(newText);
chosenClusters.add(newCluster);
}
The second "if" condition ought to be "!= 0", right? Only if it is 0 do we skip
the body, which removes an existing element, since the new element itself is
evicted.
Second, this code:
for (int i = 0; i < k; i++) {
writer.append(chosenTexts.get(i), chosenClusters.get(i));
}
... assumes that at least k elements existed in the input, and fails otherwise.
Probably need to cap this.
Patch attached.
> RandomSeedGenerator samples / output k texts incorrectly
> --------------------------------------------------------
>
> Key: MAHOUT-900
> URL: https://issues.apache.org/jira/browse/MAHOUT-900
> Project: Mahout
> Issue Type: Bug
> Components: Clustering
> Affects Versions: 0.5
> Reporter: Sean Owen
> Assignee: Robin Anil
> Priority: Minor
> Fix For: 0.6
>
> Attachments: MAHOUT-900.patch
>
>
> {code}
> int currentSize = chosenTexts.size();
> if (currentSize < k) {
> chosenTexts.add(newText);
> chosenClusters.add(newCluster);
> } else if (random.nextInt(currentSize + 1) == 0) { // with chance
> 1/(currentSize+1) pick new element
> int indexToRemove = random.nextInt(currentSize); // evict one
> chosen randomly
> chosenTexts.remove(indexToRemove);
> chosenClusters.remove(indexToRemove);
> chosenTexts.add(newText);
> chosenClusters.add(newCluster);
> }
> {code}
> The second "if" condition ought to be "!= 0", right? Only if it is 0 do we
> skip the body, which removes an existing element, since the new element
> itself is evicted.
> Second, this code:
> {code}
> for (int i = 0; i < k; i++) {
> writer.append(chosenTexts.get(i), chosenClusters.get(i));
> }
> {code}
> ... assumes that at least k elements existed in the input, and fails
> otherwise. Probably need to cap this.
> Patch attached.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira