[ 
https://issues.apache.org/jira/browse/IGNITE-12849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Glenn Wiebe updated IGNITE-12849:
---------------------------------
    Attachment: DenseStringBinaryObjectVectorizer.java
                DenseIntBinaryObjectVectorizer.java

> Add New BinaryObject Vectorizer for SparseVectors and Integer Coordinates
> -------------------------------------------------------------------------
>
>                 Key: IGNITE-12849
>                 URL: https://issues.apache.org/jira/browse/IGNITE-12849
>             Project: Ignite
>          Issue Type: New Feature
>          Components: ml
>    Affects Versions: 2.8
>            Reporter: Glenn Wiebe
>            Assignee: Alexey Zinoviev
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: DenseIntBinaryObjectVectorizer.java, 
> DenseStringBinaryObjectVectorizer.java
>
>
> A. DenseVector-based BinaryObjectVectorizer
> When using existing caches as a source of Datasets, the 
> BinaryObjectVectorizer is used.
> The existing BinaryObjectVectorizer only supports the creation of a 
> SparseVector.
> The LUDecomposition utility that supports gaussian factorization for models 
> like GMM have a "Singularity indicator" for which a SparseVector and its null 
> handling will set a matrix column calculation to be zero/0.0 which is below 
> the minimum check value (1e-11) and thus indicate a matrix is not square. 
> This null handling of the SparseMatrix will restrict the use of some 
> algorithms like Gaussian Mixture Models where any Vector dimension that is 
> null will incorrectly signal that a matrix is not square.
> It would be great if we could:
> - Have a BinaryObjectVectorizer that uses a DenseMatrix to eliminate this 
> singularity trigger and enable use of GMM Trainer.
> B. CacheBasedDatasets not treated as Temporary Cache
> When using a cache-based dataset, the close() method destroys the Ignite 
> cache. This means that there is no ability to re-use the data loaded into 
> this dataset.
> It would be great if we could:
> - Not destroy the Ignite Cache holding the dataset on close (of one step in 
> an ML processing flow)
> - Allow for "attaching" to this prior, pre-calculated dataset in subsequent 
> use.
> C. Vector Visibility
> Vectors (unlike other value types, e.g. BinaryObjects) are not visible in 
> standard mechanisms, like the Ignite Web Console, where the toString() method 
> does not present any information about the embedded vector values.
> It would be great if we could:
> - have a Vector.toString() method implementation that presented some 
> information about what is actually in the Vector.
> I have implemented the above items and have used them at a customer where I 
> needed these capabilities (or at least it dramatically reduced the cost and 
> increased the value of the solution).
> It would be great if the community was supportive of this 
> expansion/improvement of the Ignite ML library.
> Thanks,
>   Glenn



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to