Hi Felix, this is currently not supported by FlinkML. The MultipleLinearRegression algorithm expects a DataSet and not a GroupedDataSet as input. What you can do is to extract each group from the original DataSet by using a filter operation. Once you have done this, you can train the linear model on each sub part of the DataSet.
Cheers, Till On Wed, Jul 8, 2015 at 10:37 AM, Felix Neutatz <neut...@googlemail.com> wrote: > Hi Felix, > > thanks for the idea. But doesn't this mean that I can only train one model > per partition? The thing is, I have way more models than that :( > > Best regards, > Felix > > 2015-07-07 22:37 GMT+02:00 Felix Schüler <fschue...@posteo.de>: > > > Hi Felix! > > > > We had a similar usecase and I trained multiple models on partitions of > > my data with mapPartition and the model-parameters (weights) as > > broadcast variable. If I understood broadcast variables in Flink > > correctly, you should end up with one model on each TaskManager. > > > > Does that work? > > > > Felix > > > > Am 07.07.2015 um 17:32 schrieb Felix Neutatz: > > > Hi, > > > > > > at the moment I have a dataset which looks like this: > > > > > > DataSet[model_ID, DataVector] data > > > > > > So what I want to do is group by the model_ID and build for each > model_ID > > > one regression model > > > > > > in pseudo code: > > > data.groupBy(model_ID) > > > --> MultipleLinearRegression().fit(data_grouped) > > > > > > Is there anyway besides an iteration how to do this at the moment? > > > > > > Thanks for your help, > > > > > > Felix Neutatz > > > > > >