In my classification code, I create the model easily using Map, Reduce. But it has become difficult to do classification with big datasets. For big dataset like wikipedia it has become difficult to load the data into memory(even though it takes only 600MB on the disk). it shoots past 2.5GB when i use a HashMap<String, HashMap<String, Float>> to store the weights.I wish there was this big matrix server out there and all i had to do to fetch a data was call fetch(row, column).
I am trying to put th data on Hbase Please tell me if there are simpler solutions to do this using hadoop. or any other package Robin
