I made the change on the line in question, but it can't be the problem
since it did not change the functionality. To see that you have to
look at the rest of the change. It is changing...

if (! ) {
  A;
} else {
  B
}

to

if (foo) {
  B

On Wed, Jul 11, 2012 at 3:17 AM, Joshi, Shrinivas
<[email protected]> wrote:
> It looks like this regression is caused by incorrect processing of IDFs. The 
> difference I noticed between current trunk and Mahout 0.6 release related to 
> this part of the code appears to be in 
> core/src/main/java/org/apache/mahout/vectorizer/SparseVectorsFromSequenceFiles.java
>  file.
>
> I am not sure whether the changes in this class were part of a valid change 
> or were accidental. Following patch seems to address the regression that we 
> are seeing. Execution time of iteration jobs are now less than 3 mins on our 
> test cluster. Without this change we see them taking as much as 1hr 20 mins.
>
> Let me know if I am missing something here.
>
> Index: 
> core/src/main/java/org/apache/mahout/vectorizer/SparseVectorsFromSequenceFiles.java
> ===================================================================
> --- 
> core/src/main/java/org/apache/mahout/vectorizer/SparseVectorsFromSequenceFiles.java
>  (revision 1359759)
> +++ 
> core/src/main/java/org/apache/mahout/vectorizer/SparseVectorsFromSequenceFiles.java
>  (working copy)
> @@ -268,7 +268,7 @@
>            ? DictionaryVectorizer.DOCUMENT_VECTOR_OUTPUT_FOLDER+"-toprune"
>            : DictionaryVectorizer.DOCUMENT_VECTOR_OUTPUT_FOLDER;
>
> -      if (processIdf) {
> +      if (!processIdf) {
>          DictionaryVectorizer.createTermFrequencyVectors(tokenizedPath,
>                                                          outputDir,
>                                                          tfDirName,
>
> Thanks,
> -Shrinivas
>
> -----Original Message-----
> From: Joshi, Shrinivas [mailto:[email protected]]
> Sent: Friday, July 06, 2012 5:18 PM
> To: [email protected]
> Subject: Potential regression in ASFEmail KMeans clustering
>
> Just wanted to find out if this is a known/expected behavior with Mahout 
> trunk. We are noticing that the KMeans iteration jobs that are part of the 
> ASFEmail sample are taking longer to execute compared to Mahout 0.6 release. 
> Using Mahout 0.6 release on the test cluster that we have, we see these 
> jobs/steps taking not more than 6-7 minutes. However, with the  trunk code 
> that I checked out few days back it is taking anywhere between 25mins to 
> 50mins. Has anybody else seen something similar?
>
> Thanks,
> -Shrinivas
>

Reply via email to