Hi,

Felix and I are currently working on the implementation of the FeatureHasher 
(Issue #1735), which in the end returns a SparseVector.

When using “SparseVector.fromCOO" I’m facing some odd behaviour I haven’t 
expected.

Assume I create a SparseVector.fromCOO(numFeatures, Map((0, 1.0), (1, 1.0), (1, 
-1.0))), this returns a SparseVector((0, 1.0), (1, 0.0)).
I would have expected that after summing up the values of similar indices, an 
index with a resulting value of 0.0 would be dropped during the creation of a 
SparseVector.
Is this the expected behaviour or does this need to be fixed?

Furthermore, are there any plans to extend the SparseVector implementation by a 
SparseVector.fromArray(), which takes an array like Array(0.0, 1.0, 2.0, 0.0, 
3.2) as parameter and creates a SparseVector((1, 1.0), (2, 2.0), (4, 3.2)) of 
array.length while only keeping non-zero entries?

Best,
Christoph

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to