[
https://issues.apache.org/jira/browse/IGNITE-12849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17073048#comment-17073048
]
Aleksey Zinoviev commented on IGNITE-12849:
-------------------------------------------
Great point for the collaboration, if it is possible, please share a test
project here, In this thread, I need time to dig in it.
> Add New BinaryObject Vectorizer for SparseVectors and Integer Coordinates
> -------------------------------------------------------------------------
>
> Key: IGNITE-12849
> URL: https://issues.apache.org/jira/browse/IGNITE-12849
> Project: Ignite
> Issue Type: New Feature
> Components: ml
> Affects Versions: 2.8
> Reporter: Glenn Wiebe
> Assignee: Alexey Zinoviev
> Priority: Minor
> Fix For: 2.9
>
>
> A. DenseVector-based BinaryObjectVectorizer
> When using existing caches as a source of Datasets, the
> BinaryObjectVectorizer is used.
> The existing BinaryObjectVectorizer only supports the creation of a
> SparseVector.
> The LUDecomposition utility that supports gaussian factorization for models
> like GMM have a "Singularity indicator" for which a SparseVector and its null
> handling will set a matrix column calculation to be zero/0.0 which is below
> the minimum check value (1e-11) and thus indicate a matrix is not square.
> This null handling of the SparseMatrix will restrict the use of some
> algorithms like Gaussian Mixture Models where any Vector dimension that is
> null will incorrectly signal that a matrix is not square.
> It would be great if we could:
> - Have a BinaryObjectVectorizer that uses a DenseMatrix to eliminate this
> singularity trigger and enable use of GMM Trainer.
> B. CacheBasedDatasets not treated as Temporary Cache
> When using a cache-based dataset, the close() method destroys the Ignite
> cache. This means that there is no ability to re-use the data loaded into
> this dataset.
> It would be great if we could:
> - Not destroy the Ignite Cache holding the dataset on close (of one step in
> an ML processing flow)
> - Allow for "attaching" to this prior, pre-calculated dataset in subsequent
> use.
> C. Vector Visibility
> Vectors (unlike other value types, e.g. BinaryObjects) are not visible in
> standard mechanisms, like the Ignite Web Console, where the toString() method
> does not present any information about the embedded vector values.
> It would be great if we could:
> - have a Vector.toString() method implementation that presented some
> information about what is actually in the Vector.
> I have implemented the above items and have used them at a customer where I
> needed these capabilities (or at least it dramatically reduced the cost and
> increased the value of the solution).
> It would be great if the community was supportive of this
> expansion/improvement of the Ignite ML library.
> Thanks,
> Glenn
--
This message was sent by Atlassian Jira
(v8.3.4#803005)