Bhushan, have you considered simply raising the memory limit for Hadoop? 100M-300M is not that much, and 2 Gigs is very mode memory requirement of the today's machines. For comparison, small EC2 has 1.7 Gig
On Tue, Dec 22, 2009 at 9:10 AM, Jason Venner <[email protected]>wrote: > The text class supports low level access to the underlying byte array in > the > text object > > You can call getbytes directly and then incrementally transcode the bytes > into characters using the charset encoder tools, > or call the charAt method to get the characters one by 1. > The bytesToCodePoint method provides a simpler interface for sequentially > working through the data. > > On Thu, Oct 29, 2009 at 4:18 AM, bhushan_mahale < > [email protected]> wrote: > > > Hi, > > > > I am writing an M-R code using MapRunnable interface. > > The input format is SequenceFileInputFormat. > > > > Each Sequence-record contains a key-value pair of type <Text key,Text > > value> (Text: org.apache.hadoop.io.Text) > > > > The "key" Text object contains small string where as "value" Text object > > contains large XML string. > > "value" Text object can contain the data as large as 100 to 300 MB. > > > > I convert the "value" Text object to String using value.toString() > method. > > It goes OutOfMemory for large data in "value" object. > > > > Is there any other way for converting large Text object to java String > > object? > > Alternatively, can I limit the number of records in RecordReader object > > coming to run method so that total memory utilization would be limited? > > > > Thanks, > > - Bhushan > > > > > > DISCLAIMER > > ========== > > This e-mail may contain privileged and confidential information which is > > the property of Persistent Systems Ltd. It is intended only for the use > of > > the individual or entity to which it is addressed. If you are not the > > intended recipient, you are not authorized to read, retain, copy, print, > > distribute or use this message. If you have received this communication > in > > error, please notify the sender and delete all copies of this message. > > Persistent Systems Ltd. does not accept any liability for virus infected > > mails. > > > > > > -- > Pro Hadoop, a book to guide you from beginner to hadoop mastery, > http://www.amazon.com/dp/1430219424?tag=jewlerymall > www.prohadoopbook.com a community for Hadoop Professionals >
