Thanks for the detailed reply and example! Using IntMap as a vector seems to be a good idea. In your example: 1) I would use:
dot = dot' * dot' dot' = sum . elems . intersectionWith (*) norm = sum . fmap (**2) . elems instead of: dot = sum . elems . intersectionWith (*) norm = (**0.5) . sum . fmap (**2) . elems 2) I don't understand the syntax: cosineSimilarity <$> lookup x space <*> lookup y space What are <$> and <*>? Thanks, Dmitri On Wed, Jul 27, 2011 at 1:46 AM, Alexander Solla <alex.so...@gmail.com>wrote: > > > On Tue, Jul 26, 2011 at 1:30 PM, dokondr <doko...@gmail.com> wrote: > >> Hi, >> Can't find on hackage any sparse vector library. Does such thing exist? >> I need efficient storage and dot product calculation for very sparse >> vectors with about 10 out of 40 000 non-zero components. >> One solution would be to represent Sparse Vector as Data.Map with >> (component_index, component_value) pairs to store non-zero components of the >> vector. >> > > I would make a different suggestion: > > Store a Set (or maybe an IntMap) of (IntMap scalar)s. > > In other words, let each vector be represented by an IntMap whose key > represents the n'th component of the vector, and whose value is the proper > scalar. > > >> In this case, for example, calculating cosine similarity ( >> http://en.wikipedia.org/wiki/Cosine_similarity) for for every pair of 10 >> 000 vectors, would not be very nice and efficient, I am afraid. >> > > Given two (IntMap Double)s a and b, I would compute the projection of a > along b as > > cosineSimilarity :: IntMap Double -> IntMap Double -> Double > cosineSimilarity a b = (dot a b) / ((norm a) * (norm b)) where > dot = sum . elems . intersectionWith (*) > norm = (**0.5) . sum . fmap (**2) . elems > > > The only part I find tricky is enumerating all 10000^2 pairs > > pairs :: Int -> [Int] > pairs dim = do > x <- [1..dim] > y <- [1..dim] > return (x,y) > > and computing the projection for each pair: > > projections :: Floating scalar => IntMap (IntMap scalar) -> Map (Int, Int) > scalar > projections space = let dimensions = undefined -- find max key in elements > of space? > m_projection space x y = cosineSimilarity <$> > lookup x space > <*> > lookup y space > in fromList . filter (isMaybe . snd) > . fmap (\(x,y) -> ((x,y), m_projection > space x y) > . pairs > $ dimensions > > >
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe