Pat Ferrel created MAHOUT-1674:
----------------------------------

             Summary: A'A fails getting with an index out of range for a row 
vector
                 Key: MAHOUT-1674
                 URL: https://issues.apache.org/jira/browse/MAHOUT-1674
             Project: Mahout
          Issue Type: Bug
          Components: s
    Affects Versions: 0.10.0
            Reporter: Pat Ferrel
            Assignee: Dmitriy Lyubimov
            Priority: Critical
             Fix For: 0.10.0


A'A and possibly A'B can fail with an index out of bounds on the row vector. 
This seems related to partitioning where some partitions may be empty.

This can be reproduce with the attached data as input into 
spark-itemsimilarity. This is only A data and the one large csv will complete 
correctly but passing in the directory of part files will exhibit the error. 
The data is identical except in the number of files that are used to contain 
the data.

The error occurs using the local raw filesystem and with master = local and is 
pretty fast to reach. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to