[Haskell-cafe] Library for Sparse Vectors?

2011-07-26 Thread dokondr
Hi,
Can't find on hackage any sparse vector library. Does such thing exist?
I need efficient storage and dot product calculation for very sparse
vectors with about 10 out of 40 000 non-zero components.
One solution would be to represent Sparse Vector as Data.Map with
(component_index, component_value) pairs to store non-zero components of the
vector.
In this case, for example, calculating cosine similarity (
http://en.wikipedia.org/wiki/Cosine_similarity) for for every pair of 10 000
vectors, would not be very nice and efficient, I am afraid.

Thanks!
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Library for Sparse Vectors?

2011-07-26 Thread Alexander Solla
On Tue, Jul 26, 2011 at 1:30 PM, dokondr doko...@gmail.com wrote:

 Hi,
 Can't find on hackage any sparse vector library. Does such thing exist?
 I need efficient storage and dot product calculation for very sparse
 vectors with about 10 out of 40 000 non-zero components.
 One solution would be to represent Sparse Vector as Data.Map with
 (component_index, component_value) pairs to store non-zero components of the
 vector.


I would make a different suggestion:

Store a Set (or maybe an IntMap) of (IntMap scalar)s.

In other words, let each vector be represented by an IntMap whose key
represents the n'th component of the vector, and whose value is the proper
scalar.


 In this case, for example, calculating cosine similarity (
 http://en.wikipedia.org/wiki/Cosine_similarity) for for every pair of 10
 000 vectors, would not be very nice and efficient, I am afraid.


Given two (IntMap Double)s a and b, I would compute the projection of a
along b as

cosineSimilarity :: IntMap Double - IntMap Double - Double
cosineSimilarity a b  = (dot a b) / ((norm a) * (norm b)) where
 dot  = sum . elems . intersectionWith (*)
 norm = (**0.5) . sum . fmap (**2) . elems


The only part I find tricky is enumerating all 1^2 pairs

pairs :: Int - [Int]
pairs dim = do
   x - [1..dim]
   y - [1..dim]
   return (x,y)

and computing the projection for each pair:

projections :: Floating scalar = IntMap (IntMap scalar) - Map (Int, Int)
scalar
projections space = let dimensions = undefined -- find max key in elements
of space?
m_projection space x y = cosineSimilarity $ lookup
x space
  * lookup
y space
 in fromList . filter (isMaybe . snd)
 . fmap (\(x,y) - ((x,y), m_projection
space x y)
 . pairs
 $ dimensions
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Library for Sparse Vectors?

2011-07-26 Thread dokondr
Thanks for the detailed reply and example!
Using IntMap as a vector seems to be a good idea.
In your example:
1)  I would use:

   dot = dot' * dot'
   dot'  = sum . elems . intersectionWith (*)
   norm = sum . fmap (**2) . elems

 instead of:

dot  = sum . elems . intersectionWith (*)
norm = (**0.5) . sum . fmap (**2) . elems

2)  I don't understand the syntax:
cosineSimilarity $ lookup x space
 * lookup y space

What are $ and *?

Thanks,
Dmitri

On Wed, Jul 27, 2011 at 1:46 AM, Alexander Solla alex.so...@gmail.comwrote:



 On Tue, Jul 26, 2011 at 1:30 PM, dokondr doko...@gmail.com wrote:

 Hi,
 Can't find on hackage any sparse vector library. Does such thing exist?
 I need efficient storage and dot product calculation for very sparse
 vectors with about 10 out of 40 000 non-zero components.
 One solution would be to represent Sparse Vector as Data.Map with
 (component_index, component_value) pairs to store non-zero components of the
 vector.


 I would make a different suggestion:

 Store a Set (or maybe an IntMap) of (IntMap scalar)s.

 In other words, let each vector be represented by an IntMap whose key
 represents the n'th component of the vector, and whose value is the proper
 scalar.


 In this case, for example, calculating cosine similarity (
 http://en.wikipedia.org/wiki/Cosine_similarity) for for every pair of 10
 000 vectors, would not be very nice and efficient, I am afraid.


 Given two (IntMap Double)s a and b, I would compute the projection of a
 along b as

 cosineSimilarity :: IntMap Double - IntMap Double - Double
 cosineSimilarity a b  = (dot a b) / ((norm a) * (norm b)) where
  dot  = sum . elems . intersectionWith (*)
  norm = (**0.5) . sum . fmap (**2) . elems


 The only part I find tricky is enumerating all 1^2 pairs

 pairs :: Int - [Int]
 pairs dim = do
x - [1..dim]
y - [1..dim]
return (x,y)

 and computing the projection for each pair:

 projections :: Floating scalar = IntMap (IntMap scalar) - Map (Int, Int)
 scalar
 projections space = let dimensions = undefined -- find max key in elements
 of space?
 m_projection space x y = cosineSimilarity $
 lookup x space
   *
 lookup y space
  in fromList . filter (isMaybe . snd)
  . fmap (\(x,y) - ((x,y), m_projection
 space x y)
  . pairs
  $ dimensions



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Library for Sparse Vectors?

2011-07-26 Thread Henning Thielemann


On Tue, 26 Jul 2011, Alexander Solla wrote:


Given two (IntMap Double)s a and b, I would compute the projection of a along b 
as

cosineSimilarity :: IntMap Double - IntMap Double - Double
cosineSimilarity a b  = (dot a b) / ((norm a) * (norm b)) where
                 dot  = sum . elems . intersectionWith (*)
                 norm = (**0.5) . sum . fmap (**2) . elems


Never write (**2) and (**0.5)! Use (^2) and sqrt!

http://www.haskell.org/haskellwiki/Power_function

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Library for Sparse Vectors?

2011-07-26 Thread Henning Thielemann


On Wed, 27 Jul 2011, dokondr wrote:


In your example:
1)  I would use:

   dot = dot' * dot'
   dot'  = sum . elems . intersectionWith (*)
   norm = sum . fmap (**2) . elems
 instead of:

dot  = sum . elems . intersectionWith (*)
norm = (**0.5) . sum . fmap (**2) . elems
2)  I don't understand the syntax:
cosineSimilarity $ lookup x space                  * lookup y space     

What are $ and *?


They are from Control.Applicative and are applied to a Maybe type here. It 
means that cosineSimilarity is applied to the result of looking up 'x' and 
'y' in space. If 'x' or 'y' is not in space, then

  cosineSimilarity $ lookup x space * lookup y space
 is Nothing.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Library for Sparse Vectors?

2011-07-26 Thread Alexander Solla
On Tue, Jul 26, 2011 at 3:27 PM, dokondr doko...@gmail.com wrote:

 Thanks for the detailed reply and example!
 Using IntMap as a vector seems to be a good idea.
 In your example:
 1)  I would use:

dot = dot' * dot'
dot'  = sum . elems . intersectionWith (*)
norm = sum . fmap (**2) . elems

  instead of:

 dot  = sum . elems . intersectionWith (*)
 norm = (**0.5) . sum . fmap (**2) . elems


Your dot' is a function, so (dot' * dot') wouldn't type check.


 2)  I don't understand the syntax:

 cosineSimilarity $ lookup x space
  * lookup y space

 What are $ and *?


(lookup x space) has type (Maybe something).  So does (lookup y space).  We
are using applicative functors to pull out values from (lookup x space)
and (lookup y space), apply cosineSimilarity to the values we pulled out,
and wrapping it all back up in a Maybe, depending on whether the lookups
found something or not.

$ is exactly the same thing as fmap.
* is harder to explain.  But it is just plumbing.


http://learnyouahaskell.com/functors-applicative-functors-and-monoids
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe