Alexey, Andrei,

Here are some thoughts on what would be good to have in Python-Ignite ML
notebook:

   - some way to pick an optional sample size (out of a very big cache
   size) that gets communicated and set aside to all partitions
   - some way to count number of unique values for a category (for example
   should we do a one-hot or String encoding) - this might need to be entire
   dataset if you want to do one-hot
   - some way to do a quick assessment and simple listing (similar to
   SKlearn pretty bar chart) about contributions for each feature to the label.
   - some way to allow vector to choose its own indexes based on:
   predictive weight, data type (for example automatically encodes category)
   - some way to report simple cluster metrics in  Python notebook - but
   focus on ML stuff like raw cache, sample cache, vector info, preprocessing
   / training stats etc
   - some way to input lists of things to do on a data set in parallel
   (list of algorithms for example) and then let Ignite ML run them all and
   report comparisons back
   - some way to explain the steps that were run by ML in the background
   and a report on all the steps

We might even just create some sort of demo Python wrapper that sits in
front of the  org.apache.ignite.examples.ml.tutorial code, but pass in a
different cache handle instead of Titanic and also run all of the Java
classes (DT, impute, categorial encoding, scaling, etc etc) in parallel
instead of serially.




*Ken Cottrell*

*mobile: +1 (214) 546-5100*
*[email protected] <[email protected]>*

*https://www.linkedin.com/in/kennethcottrell
<https://www.linkedin.com/in/kennethcottrell>*



On Thu, Mar 5, 2020 at 8:53 AM Alexey Zinoviev <[email protected]>
wrote:

> Agree with simple case, I think we could start from the simple poc for the
> Python for ML in the next release
>
> чт, 5 мар. 2020 г., 17:05 AG <[email protected]>:
>
> >
> > Thanks, for the reply!
> >
> > It looks like a high-level API similar to Sklearn pipelines.
> > In my opinion, for the first steps easier to add simple assess to gain
> the
> > ability to run a simple model or simple preprocessor from python.
> >
> > According to your example:
> > Here is raw dataset, already inside this cluster cache "myName", with
> > Label column "MyLable".
> >
> > I want to run from notebook UI imputer and knn using python API. Export
> > results to file storage as an example.
> >
> > In my opinion, the ability to create such a simple workflow should be our
> > goal for the first time.
> >
> > Thank You!
> >
> > Best regards,
> > Andrei Gavrilov.
> >
> > Sent with ProtonMail Secure Email.
> >
> > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> > On Wednesday, March 4, 2020 10:49 PM, kencottrell <
> > [email protected]> wrote:
> >
> > > Andrei,
> > >
> > > I am also working with Apache Ignite ML and am interested in providing
> > > wrappers for Ignite ML API, but am wondering if instead of simply
> > recreating
> > > the low level Java API for ML inside Python, how about creating some
> > higher
> > > level services "Auto ML" workflow ? For example:
> > >
> > > 1.  here is raw dataset, already inside this cluster cache "myName",
> with
> > >     Label column "MyLable" , take N samples tell me which appear to be
> > numeric,
> > >     unique id, and categorical values?
> > >
> > > 2.  based on N samples, , please run some analysis and tell me the top
> 5
> > >     feature columns in terms of predictive value using algorithm =
> > RandonForest
> > >
> > > 3.  do a batch run, sample size = N, using these preprocessing steps
> list
> > >     {impute, scale, etc} and algorithms (knn, Decision Tree, etc} and
> > give me a
> > >     report of accuracies obtain with each.
> > >
> > >     In other words, we have a simple sample in the Tutorial demo where
> > these
> > >     all run and then we compare outputs - why not automate these with a
> > Python
> > >     Notebook UI of some sort?
> > >
> > >     --
> > >     Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> > >
> >
> >
> >
>

Reply via email to