The problem is that I have a single url. I get the inlinks to that url and then I need to go access content from all of its inlink urls that have been fetched. I was doing this through Random access. But then I went back and re-read the google MapReduce paper and saw that it was designed for Sequential access and saw that Hadoop implements the same way. But so far I haven't found a way to efficiently solve this kind of problem in sequential format.

If I were to do it in the configure and close wouldn't that still open a single reader per map call?

Dennis

Doug Cutting wrote:
Dennis Kubes wrote:
I am trying to read a MapFile inside mapper and reducer implementations. So far the only way I have found to do it is by opening a new reader for each map and reduce call. Is anybody doing something similar and if so is there a way to open a single reader and reuse it across multiple map or reduce calls?

Can't you open it in the configure() implementation? And close it in the close() implementation?

Are you randomly accessing a MapFile from a map() implementation? That's not going to scale very well. MapReduce is designed for sequential access.

Doug

Reply via email to