Yes. The present work-arounds for this are pretty complicated.
option1) you can write small files relatively frequently and every time you write some number of them, you can concatenate them and delete them. These concatenations can receive the same treatment. If managed carefully in conjunction with a safe status update mechanism like zookeeper, you can have a pretty robust system that reflects new data with fairly low latency (on the order of seconds behind). option2) you can accumulate data in a non-HDFS location until it is big enough to push to HDFS. This can be done in conjunction with option1. The danger is that you run the risk of losing data if the accumulator fails before burping data to HDFS. This is very commonly used for log files that are consolidated at the hourly level and transferred to HDFS. On 3/27/08 12:02 AM, "Raghavendra K" <[EMAIL PROTECTED]> wrote: > Hi, > Thanks for the reply. > Does this mean that once I close a file, I can open it only for reading? > And if I reopen the same file to write any data then the old data will be > lost and again its as good as a new file being created with the same name? > > On Thu, Mar 27, 2008 at 12:23 PM, dhruba Borthakur <[EMAIL PROTECTED]> > wrote: > >> HDFS files, once closed, cannot be reopened for writing. See HADOOP-1700 >> for more details. >> >> Thanks, >> dhruba >> >> -----Original Message----- >> From: Raghavendra K [mailto:[EMAIL PROTECTED] >> Sent: Wednesday, March 26, 2008 11:29 PM >> To: [email protected] >> Subject: Append data in hdfs_write >> >> Hi, >> I am using >> hdfsWrite to write data onto a file. >> Whenever I close the file and re open it for writing it will start >> writing >> from the position 0 (rewriting the old data). >> Is there any way to append data onto a file using hdfsWrite. >> I cannot use hdfsTell because it works only when opened in RDONLY mode >> and >> also I dont know the number of bytes written onto the file previously. >> Please throw some light onto it. >> >> -- >> Regards, >> Raghavendra K >> > >
