Hi All,

I experience some memory retention while copying data into HDFS when a
IOExeption is thrown.

My use case is the following: I have multiple threads sharing a
FileSystem object, all uploading files. At some point quota is
exceeded in one thread and I get a DSQuotaExceededException (subclass
of IOException). In both regular case and when such exception is
thrown, I'm closing the DFSOutputStream.
But for DFSOutputStream that encountered a IOException, the last
Packet is kept in memory until the FileSystem is closed. Which I
usually don't close really often.

So my questions:

- Is this the expected behavior and need I to deal with ?
- Is there a way to close properly a DFSOutputStream (and freeing all
the retained memory) without closing the FileSystem ?
- Is the usage of one shared FileSystem in several threads recommended ?

Attached is a simple test reproducing the behavior: MiniDFSCluster is
launched, a deadly small quota is set to have IOException thrown.
Random content is generated and uploaded to hdfs. FileSystem is not
closed, thus memory is growing till an OOM is thrown (don't blame me
for the @Test(expected = OutOfMemoryError.class) :)). Tested on Hadoop
1.0.2.

Thanks in advance for your answers, pointers and advises.

Benoit.

Reply via email to