If you only have 4 variables and 16k rows, why do you need anything even close to Hadoop? This is is a problem which could be regressed on an iPhone, couldn't it?
-jake On Mon, Dec 7, 2009 at 3:29 PM, Rajat Banerjee <[email protected]> wrote: > Dear Ted, Thanks for your prompt reply. > > There are 16,000 rows of data. There are only four significant > variables in my hypothesis. The regression shouldn't be too nasty. > I've looked at some non-distributed libraries and they seem capable, > but would love to get it started in hadoop since that's my end goal. > > single-threaded : > http://www.ee.ucl.ac.uk/~mflanaga/java/Regression.html#sumgl<http://www.ee.ucl.ac.uk/%7Emflanaga/java/Regression.html#sumgl> > > > Thanks. Best, > Rajat > > > On Mon, Dec 7, 2009 at 6:21 PM, Ted Dunning <[email protected]> wrote: > > We don't have these right now. We had a summer of code student start on > > Logistic Regression, but she didn't complete the project. > > > > Can you say more about your problem? Are you saying that you have 16,000 > > predictor variables sampled in time and one prediction variable (presence > of > > short squeeze)? Or is it possible for short squeezes to be applied to > > individual equities so that you have 16,000 time series each annotated > with > > whether a short squeeze occurred? > > > > If the former, then you have a much bigger problem than just doing the > > regression. If the latter, then you might be able to use some on-line > > learning software like Vowpal Wabbit to do your job. > > > > Can you say more? > > > > On Mon, Dec 7, 2009 at 3:04 PM, Rajat Banerjee <[email protected]> > wrote: > > > >> Dear Apache Community, > >> I am looking to perform a linear regression on a rather large amount > >> of data in my hadoop cluster. It is part of my master's thesis at > >> harvard university. > >> > >> After perusing the docs on the Mahout site, it seems like the > >> following algorithms havent been implemented yet- > >> Locally-Weighted Linear Regression > >> Linear Regression > >> Logistic Regression > >> > >> Basically, there is a stock market phenomenon which I'm trying to > >> predict. It is called a short squeeze. I have about 16,000 data points > >> - stocks and a point in time where the phenomenon has occurred. I'm > >> trying to develop a predictive model in a hadoop cluster. > >> > >> The accuracy of the model doesn't matter much at this point, the goal > >> and what would make my prof happy is to see the cluster grinding away, > >> doing some relevant but perhaps not totally correct mathematical > >> operations. Read: If its a linear regression i'll be happy, but if it > >> isn't possible yet I dont mind. > >> > >> Can anyone suggest something I can use? I've downloaded Mahout 0.2 and > >> searched through it, but nothing for performing regressions has jumped > >> out at me. > >> Thank you. > >> Best, > >> Rajat > >> > > > > > > > > -- > > Ted Dunning, CTO > > DeepDyve > > >
