Sounds like Pig. Or Cascading. Or Hive. Seriously, isn't this already available?
On Wed, Feb 16, 2011 at 7:06 AM, Guy Doulberg <[email protected]>wrote: > > Hey all, > I want to consult with you hadoppers about a Map/Reduce application I want > to build. > > I want to build a map/reduce job, that read files from HDFS, perform some > sort of transformation on the file lines, and store them to several > partition depending on the source of the file or its data. > > I want this application to be as configurable as possible, so I designed > interfaces to Parse, Decorate and Partition(On HDFS) the Data. > > I want to be able to configure different data flows, with different > parsers, decorators and partitioners, using a config file. > > Do you think, you would use such an application? Does it fit an open-source > project? > > Now, I have some technical questions: > I was thinking of using reflection, to load all the classes I would need > according to the configuration during the setup process of the Mapper. > Do you think it is a good idea? > > Is there a way to send the Mapper objects or interfaces from the Job > declaration? > > > > Thanks, > >
