The problem is that I have a single url. I get the inlinks to that url
and then I need to go access content from all of its inlink urls that
have been fetched.
I was doing this through Random access. But then I went back and
re-read the google MapReduce paper and saw that it was designed for
Sequential access and saw that Hadoop implements the same way. But so
far I haven't found a way to efficiently solve this kind of problem in
sequential format.
If I were to do it in the configure and close wouldn't that still open a
single reader per map call?
Dennis
Doug Cutting wrote:
Dennis Kubes wrote:
I am trying to read a MapFile inside mapper and reducer
implementations. So far the only way I have found to do it is by
opening a new reader for each map and reduce call. Is anybody doing
something similar and if so is there a way to open a single reader
and reuse it across multiple map or reduce calls?
Can't you open it in the configure() implementation? And close it in
the close() implementation?
Are you randomly accessing a MapFile from a map() implementation?
That's not going to scale very well. MapReduce is designed for
sequential access.
Doug