Dear all, We have a task to run a map-reduce job multiple times to do some machine learning calculation. We will first use a mapper to update the data iteratively, and then use the reducer to process the output of the mapper to update a global matrix. After that, we need to re-use the output of the previous mapper(as a datasource) and reducer(as a set of parameters) to re-run the map-reduce again to do another round of learning.
I am wondering is there any setting or API I could use to let the hadoop to keep both the output of the mapper and reducer? Now it looks if it is a job contains a reducer, it will delete the intermediate result generated by the mapper. Thanks. Stanley Xu