Glenn Wiebe created IGNITE-12849:
------------------------------------

             Summary: Add New BinaryObject Vectorizer for SparseVectors and 
Integer Coordinates
                 Key: IGNITE-12849
                 URL: https://issues.apache.org/jira/browse/IGNITE-12849
             Project: Ignite
          Issue Type: New Feature
          Components: ml
    Affects Versions: 2.8
            Reporter: Glenn Wiebe
             Fix For: 2.8


A. DenseVector-based BinaryObjectVectorizer
When using existing caches as a source of Datasets, the BinaryObjectVectorizer 
is used.
The existing BinaryObjectVectorizer only supports the creation of a 
SparseVector.
The LUDecomposition utility that supports gaussian factorization for models 
like GMM have a "Singularity indicator" for which a SparseVector and its null 
handling will set a matrix column calculation to be zero/0.0 which is below the 
minimum check value (1e-11) and thus indicate a matrix is not square. 

This null handling of the SparseMatrix will restrict the use of some algorithms 
like Gaussian Mixture Models where any Vector dimension that is null will 
incorrectly signal that a matrix is not square.

It would be great if we could:
- Have a BinaryObjectVectorizer that uses a DenseMatrix to eliminate this 
singularity trigger and enable use of GMM Trainer.

B. CacheBasedDatasets not treated as Temporary Cache
When using a cache-based dataset, the close() method destroys the Ignite cache. 
This means that there is no ability to re-use the data loaded into this dataset.

It would be great if we could:
- Not destroy the Ignite Cache holding the dataset on close (of one step in an 
ML processing flow)
- Allow for "attaching" to this prior, pre-calculated dataset in subsequent use.

C. Vector Visibility
Vectors (unlike other value types, e.g. BinaryObjects) are not visible in 
standard mechanisms, like the Ignite Web Console, where the toString() method 
does not present any information about the embedded vector values.

It would be great if we could:
- have a Vector.toString() method implementation that presented some 
information about what is actually in the Vector.

I have implemented the above items and have used them at a customer where I 
needed these capabilities (or at least it dramatically reduced the cost and 
increased the value of the solution).

It would be great if the community was supportive of this expansion/improvement 
of the Ignite ML library.

Thanks,
  Glenn




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to