Hi, Felix and I are currently working on the implementation of the FeatureHasher (Issue #1735), which in the end returns a SparseVector.
When using “SparseVector.fromCOO" I’m facing some odd behaviour I haven’t expected. Assume I create a SparseVector.fromCOO(numFeatures, Map((0, 1.0), (1, 1.0), (1, -1.0))), this returns a SparseVector((0, 1.0), (1, 0.0)). I would have expected that after summing up the values of similar indices, an index with a resulting value of 0.0 would be dropped during the creation of a SparseVector. Is this the expected behaviour or does this need to be fixed? Furthermore, are there any plans to extend the SparseVector implementation by a SparseVector.fromArray(), which takes an array like Array(0.0, 1.0, 2.0, 0.0, 3.2) as parameter and creates a SparseVector((1, 1.0), (2, 2.0), (4, 3.2)) of array.length while only keeping non-zero entries? Best, Christoph
signature.asc
Description: Message signed with OpenPGP using GPGMail