Alexey, Andrei, Here are some thoughts on what would be good to have in Python-Ignite ML notebook:
- some way to pick an optional sample size (out of a very big cache size) that gets communicated and set aside to all partitions - some way to count number of unique values for a category (for example should we do a one-hot or String encoding) - this might need to be entire dataset if you want to do one-hot - some way to do a quick assessment and simple listing (similar to SKlearn pretty bar chart) about contributions for each feature to the label. - some way to allow vector to choose its own indexes based on: predictive weight, data type (for example automatically encodes category) - some way to report simple cluster metrics in Python notebook - but focus on ML stuff like raw cache, sample cache, vector info, preprocessing / training stats etc - some way to input lists of things to do on a data set in parallel (list of algorithms for example) and then let Ignite ML run them all and report comparisons back - some way to explain the steps that were run by ML in the background and a report on all the steps We might even just create some sort of demo Python wrapper that sits in front of the org.apache.ignite.examples.ml.tutorial code, but pass in a different cache handle instead of Titanic and also run all of the Java classes (DT, impute, categorial encoding, scaling, etc etc) in parallel instead of serially. *Ken Cottrell* *mobile: +1 (214) 546-5100* *[email protected] <[email protected]>* *https://www.linkedin.com/in/kennethcottrell <https://www.linkedin.com/in/kennethcottrell>* On Thu, Mar 5, 2020 at 8:53 AM Alexey Zinoviev <[email protected]> wrote: > Agree with simple case, I think we could start from the simple poc for the > Python for ML in the next release > > чт, 5 мар. 2020 г., 17:05 AG <[email protected]>: > > > > > Thanks, for the reply! > > > > It looks like a high-level API similar to Sklearn pipelines. > > In my opinion, for the first steps easier to add simple assess to gain > the > > ability to run a simple model or simple preprocessor from python. > > > > According to your example: > > Here is raw dataset, already inside this cluster cache "myName", with > > Label column "MyLable". > > > > I want to run from notebook UI imputer and knn using python API. Export > > results to file storage as an example. > > > > In my opinion, the ability to create such a simple workflow should be our > > goal for the first time. > > > > Thank You! > > > > Best regards, > > Andrei Gavrilov. > > > > Sent with ProtonMail Secure Email. > > > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ > > On Wednesday, March 4, 2020 10:49 PM, kencottrell < > > [email protected]> wrote: > > > > > Andrei, > > > > > > I am also working with Apache Ignite ML and am interested in providing > > > wrappers for Ignite ML API, but am wondering if instead of simply > > recreating > > > the low level Java API for ML inside Python, how about creating some > > higher > > > level services "Auto ML" workflow ? For example: > > > > > > 1. here is raw dataset, already inside this cluster cache "myName", > with > > > Label column "MyLable" , take N samples tell me which appear to be > > numeric, > > > unique id, and categorical values? > > > > > > 2. based on N samples, , please run some analysis and tell me the top > 5 > > > feature columns in terms of predictive value using algorithm = > > RandonForest > > > > > > 3. do a batch run, sample size = N, using these preprocessing steps > list > > > {impute, scale, etc} and algorithms (knn, Decision Tree, etc} and > > give me a > > > report of accuracies obtain with each. > > > > > > In other words, we have a simple sample in the Tutorial demo where > > these > > > all run and then we compare outputs - why not automate these with a > > Python > > > Notebook UI of some sort? > > > > > > -- > > > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/ > > > > > > > > > >
