Hi,

I have written a code to create sequence files for given text files.
The program takes following input parameters:

 1.  Local source directory - contains all the input text files
 2.  Destination HDFS URI - location on hdfs where sequence file will be copied

The key for a sequence-record is the file-name.
The value for a sequence-record is the content of the text file.

The program runs fine for large number input text files. But if the size of a 
single input text file is > 100 MB then it throws following exception:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at java.lang.String.toCharArray(String.java:2726)
        at org.apache.hadoop.io.Text.encode(Text.java:388)
        at org.apache.hadoop.io.Text.set(Text.java:178)
        at org.apache.hadoop.io.Text.<init>(Text.java:81)
        at SequenceFileCreator.create(SequenceFileCreator.java:106)
        at SequenceFileCreator.processFile(SequenceFileCreator.java:168)

I am using "org.apache.hadoop.io.SequenceFile.Writer" for creating the sequence 
file. The Text class is used for keyclass and valclass.

I tried increasing the max memory for the program but it throws same error.

Can you provide your suggestions?

Thanks,
- Bhushan


DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.

Reply via email to