Just my $0.02, but I think you really ought to push back to have
whoever's creating the files upstream do it in one of the manners I
described. This way will be way too error prone. I mean, think about
it: with your current set up you not only can't reliably know if the
creator is finished writing to a file, but you also can't know if even
if they are finished whether the file write was completed successfully.
The creator could have aborted the file write in the middle - either
purposely or inadvertently - and you'll be trying to process an
incomplete file.
You really need to employ *some* method to reliably determine when a
file is successfully uploaded, or you're going to wind up with a very
buggy system.
DR
On 08/17/2011 01:41 PM, Adam Shook wrote:
Sadly, I don't have control over naming the files. They are being ingested in
HDFS by powers out of my control. I'll mess around with the modification times
and see if I can get a good solution. If anyone knows of a way that seems less
hackish, I am all ears.
Thanks, Adam
-----Original Message-----
From: David Rosenstrauch [mailto:dar...@darose.net]
Sent: Wednesday, August 17, 2011 1:22 PM
To: hdfs-user@hadoop.apache.org
Subject: Re: HDFS File being written
On 08/17/2011 12:57 PM, Adam Shook wrote:
Hello All,
Is there any clean way to tell from the API (v0.20.2) that a file in HDFS is
currently being written to? I've seen some exceptions before related to it,
but I was hoping there is a clean way and Google isn't turning anything up for
me.
Thanks!
-- Adam
You might be able to do it to some extent using
FileStatus.getModificationTime()
(http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileStatus.html#getModificationTime()),
but this would really be a hack, IMO, and not something you should rely on.
I think you'd be better off either a) writing the file to a temp
directory, or b) writing it with a .tmp extension, and then moving or
renaming it once the file write is complete.
HTH,
DR
-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 10.0.1392 / Virus Database: 1520/3840 - Release Date: 08/17/11