If you want to compute the cosines between pairs of documents (each a compared with each b), then the dimension is 100, the size of each document. If you want to compare the whole index then you will need to make them the same length (number of elements) by padding the shorter with zeroes. There are computational shortcuts, but this is the principle.
How are you representing the sentences as numerical values? Herb Sent from my iPad > On Mar 20, 2014, at 5:07 PM, "Stefy D." <tsuki_st...@yahoo.com> wrote: > > Dear all, > > I am trying to compute the cosine similarity between several documents. I > have an indexed directory A made using 10000 files and another indexed > directory B made using 20000 files. All the indexed documents from both > directories have the same length (100 sentences). I want to get the cosine > similarity between documents from directory A and documents from directory B. > I have used the code from here but on the two indexed directories. So I use > something like getCosineSimilarity(docs_A[i], docs_B[j]); > > I get the following error: > Exception in thread "main" > org.apache.commons.math3.exception.DimensionMismatchException: 44,375 != > 596,263 > at > org.apache.commons.math3.linear.RealVector.checkVectorDimensions(RealVector.java:179) > at > org.apache.commons.math3.linear.RealVector.checkVectorDimensions(RealVector.java:165) > at > org.apache.commons.math3.linear.RealVector.dotProduct(RealVector.java:307) > at NewApp.testCosine.getCosineSimilarity(testCosine.java:57) > > Please help me. Thank you very much! --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org