Hi All, How can I determine if a file is being written to (by any thread) in HDFS. I have a continuous process on the master node, which is tracking a particular folder in HDFS for files to process. On the slave nodes, I am creating files in the same folder using the following code :
At the slave node: import org.apache.commons.io.IOUtils; import org.apache.hadoop.fs.FileSystem; import java.io.OutputStream; OutputStream oStream = fileSystem.create(path); IOUtils.write(<Some String>, oStream); IOUtils.closeQuietly(oStream); At the master node, I am getting the earliest modified file in the folder. At times when I try reading the file, I get nothing in the file, mostly because the slave might be still finishing writing to the file. Is there any way, to somehow tell the master, that the slave is still writing to the file and to check the file sometime later for actual content. Thanks, -- Nitin Khandelwal
