Reverse Indexing Programming Help

DoomUs Thu, 31 Mar 2011 21:49:02 -0700

I'm just starting out using Hadoop.  I've looked through the java examples,
and have an idea about what's going on, but don't really get it.


I'd like to write a program that takes a directory of files.  Contained in
those files are a URL to a website on the first line, and the second line is
the TEXT from that website.

The mapper should create a map for each word in the text to that URL, so
every word found on the website would map to the URL.

The reducer then, would collect all of the URLs that are mapped to via a
given word.

Each Word->URL is then written to a file.

So, it's "simple" as a program designed to run on a single system, but I
want to be able to distribute the computation and whatnot using Hadoop.

I'm extremely new to Hadoop,  I'm not even sure how to ask all of the
questions I'd like answers for, I have zero experience in MapReduce, and
limited experience in functional programming at all.  Any programming tips,
or if I have my "Mapper" or "Reducer" defined incorrectly, corrections, etc
would be greatly appreciated.

Questions:
How do I read (and write) files from hdfs?
Once I've read them, How do I distribute the files to be mapped?
I know I need a class to implement the mapper, and one to implement the
reducer, but how does the class have a return type to output the map?

Thanks a lot for your help.
-- 
View this message in context: 
http://old.nabble.com/Reverse-Indexing-Programming-Help-tp31292449p31292449.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Reverse Indexing Programming Help

Reply via email to