[ 
https://issues.apache.org/jira/browse/IGNITE-8666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Dmitriev updated IGNITE-8666:
-----------------------------------
    Description: 
So far we use straightforward strategy to feed data into partition based 
dataset. We retrieve all entries from an upstream cache partition, transform it 
somehow and write into correspondent dataset partition (data and context). As 
result we can't choose the data to be fed into dataset and data to be not fed. 
To implement IGNITE-8667 (Splitting of dataset to test and training sets) and 
IGNITE-8668 (K-fold cross validation of models) we need to have such ability.

The goal of this task is to add an ability to filter data that fed from cache 
to dataset. It will allow us to create different dataset (training, testing, 
k-fold, etc...) based on a single cache

  was:
So far we use straightforward strategy to feed data into partition based 
dataset. We retrieve all entries from an upstream cache partition, transform it 
somehow and write into correspondent dataset partition (data and context). As 
result we can't choose the data to be fed into dataset and data to be not fed. 
To implement IGNITE-8667 (Splitting of dataset to test and training sets) and 
IGNITE-8668 (K-fold cross validation of models) we need to have such ability.

The goal of this task is to add an ability to filter data that fed from cache 
to dataset.


> Add ability of filtering data during datasets creation
> ------------------------------------------------------
>
>                 Key: IGNITE-8666
>                 URL: https://issues.apache.org/jira/browse/IGNITE-8666
>             Project: Ignite
>          Issue Type: New Feature
>          Components: ml
>            Reporter: Yury Babak
>            Assignee: Anton Dmitriev
>            Priority: Major
>             Fix For: 2.6
>
>
> So far we use straightforward strategy to feed data into partition based 
> dataset. We retrieve all entries from an upstream cache partition, transform 
> it somehow and write into correspondent dataset partition (data and context). 
> As result we can't choose the data to be fed into dataset and data to be not 
> fed. To implement IGNITE-8667 (Splitting of dataset to test and training 
> sets) and IGNITE-8668 (K-fold cross validation of models) we need to have 
> such ability.
> The goal of this task is to add an ability to filter data that fed from cache 
> to dataset. It will allow us to create different dataset (training, testing, 
> k-fold, etc...) based on a single cache



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to