How to flush SequenceFile.Writer?

Brian Long Thu, 29 Jan 2009 15:18:15 -0800

I have a SequenceFile.Writer that I obtained via SequenceFile.createWriter
and write to using append(key, value). Because the writer volume is low,
it's not uncommon for it to take over a day for my appends to finally be
flushed to HDFS (e.g. the new file will sit at 0 bytes for over a day).
Because I am running map/reduce tasks on this data multiple times a day, I
want to "flush" the sequence file so the mapred jobs can pick it up when
they run.
What's the right way to do this? I'm assuming it's a fairly common use
case. Also -- are writes to the sequence files atomic? (e.g. if I am
actively appending to a sequence file, is it always safe to read from that
same file in a mapred job?)


To be clear, I want the flushing to be time based (controlled explicitly by
the app), not size based. Will this create waste in HDFS somehow?

Thanks,
Brian

How to flush SequenceFile.Writer?

Reply via email to