There is no "easy" API available to do this. You will have to manually
read the file, behead it, and write the rest back (and swap).

You could do this with MapReduce too, since the first blocks of any
file would have their start index set to 0 in the InputSplit received
by the RecordReader (or you could work it out with a non splitting
input format).

This sort of cleaning up must be done prior to/during the loading upon
HDFS, IMHO.

-- 
Harsh J
www.harshj.com

Reply via email to