Denis, We have some metadata stored in an Ignite Cache where each row describes a certain data series, and each column is a property (could be actually of any type: strings, doubles, etc.). You can think about it as a table describing our data series. This table might be potentially quite big, given a high number of series and properties.
Based on this table we would like to clusterize our data using different algorithms (e.g. k-means, decision tree). I started looking at it and I liked pretty much the way you have done the pre-processing pipeline for feature selection, transformation, normalization and scaling. The only stone I found on my way was the BinaryObject problem I mentioned. In fact I made it work as I described in my first post, but with a dirty solution as I didn't find the way to access the keepBinary property of the cache used as input. In any case, I will be glad to help in finding a clean solution to the problem if needed. Best, Oscar -- Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/