Hello,

I'm starting to use Hadoop for something I'm working on.  I'm on a windows
machine (xp) and I cannot consider changing to any other OS.  I'm using
eclipse with the hadoop plug in to develop, and I have cygwin fully
installed and working; I am using hadoop-0.20.0.  I tried to develop a class
that would take a file or multiple files in my input directory, and read
them in using XStream.  The input converts the class read from the file back
to an xml string which is stored in a Text object for processing.  The
reason I am doing this is because my data is stored as XML in a .txt file
after I read them in.  I cannot use the default reader classes as when
XStream writes the object, the fields are seperated by newlines, and since
the default reader reads a line of text and submits to the mapper, this wont
work for me. 

After coding my own reader class, my problem is now that the map-reduce
process seems to do nothing.  It runs and says it was complete, but it did
not process anything.   I've tried debugging as a single process in eclipse
by doing config.set("mapred.job.tracker", "local"); in my main method. 
However, when I run the program that way (either normally or in debug mode)
from eclipse, I always get an out of memory exception.  The configuration
has pre-set my child max heap size to 200.  I've also tried to make a class
with a main method that runs IsolationRunner on a job.xml file.  However, I
get the same problem (out of memory).  Can someone give me a 'dumbed down'
way of using the eclipse debugger with my map-reduce code?

Finally, if anyone could point me in the right direction on any resources
that explain how to code a custom input class, that'd be great.  I was
referencing Yahoo's tutorial, however they use deprecated methods, and I
have been using the new methods.

Thanks.

PS.  I am very very new at this, so please excuse me if my post was unclear
or missing key information.
-- 
View this message in context: 
http://www.nabble.com/Custom-input-help-debug-help-tp24400447p24400447.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Reply via email to