Hi Felix,

this is currently not supported by FlinkML. The MultipleLinearRegression
algorithm expects a DataSet and not a GroupedDataSet as input. What you can
do is to extract each group from the original DataSet by using a filter
operation. Once you have done this, you can train the linear model on each
sub part of the DataSet.

Cheers,
Till
​

On Wed, Jul 8, 2015 at 10:37 AM, Felix Neutatz <neut...@googlemail.com>
wrote:

> Hi Felix,
>
> thanks for the idea. But doesn't this mean that I can only train one model
> per partition? The thing is, I have way more models than that :(
>
> Best regards,
> Felix
>
> 2015-07-07 22:37 GMT+02:00 Felix Schüler <fschue...@posteo.de>:
>
> > Hi Felix!
> >
> > We had a similar usecase and I trained multiple models on partitions of
> > my data with mapPartition and the model-parameters (weights) as
> > broadcast variable. If I understood broadcast variables in Flink
> > correctly, you should end up with one model on each TaskManager.
> >
> > Does that work?
> >
> > Felix
> >
> > Am 07.07.2015 um 17:32 schrieb Felix Neutatz:
> > > Hi,
> > >
> > > at the moment I have a dataset which looks like this:
> > >
> > > DataSet[model_ID, DataVector] data
> > >
> > > So what I want to do is group by the model_ID and build for each
> model_ID
> > > one regression model
> > >
> > > in pseudo code:
> > > data.groupBy(model_ID)
> > >         --> MultipleLinearRegression().fit(data_grouped)
> > >
> > > Is there anyway besides an iteration how to do this at the moment?
> > >
> > > Thanks for your help,
> > >
> > > Felix Neutatz
> > >
> >
>

Reply via email to