There is a special class for storing maps in HDFS. Look at MapFile.
Also, any mapper can contact an outside resource such as a database. This is, however, very bad practice since the load on the outside resource can skyrocket as your cluster grows. If the cost of the request is small relative to the work of the map, then this might be kind-of sort-of OK, but maps are often very cheap to do which means that any dependence on a single external resource can be really bad for throughput. On 7/16/07 6:19 AM, "novice user" <[EMAIL PROTECTED]> wrote: > > Hi All, > I am new to hadoop and learning how to use it. I have a problem which can > be solvable using map-reduce technique. But, in my map step, I need to > consider some extra information which depends on the input key,value pair. > Can some one please help me what is the good way of taking this data? I am > thinking of storing it in some HDFS and map code try to load it whenever it > is processing a particular key, value pair. any other better approaches, > please let me know. > > > Thanks in advance
