This does impose some limitations. Not so much on crawling, but on making the 
information available. 
You'd crawl, writing to a file until it's large enough or until some time has 
elapsed or something, then you'd close the file and
start a new one. At that time the file you closed becomes available to 
applications to consume, and it also becomes immutable.
If you want something like a daily file, you can basically set up a daily 
directory instead, since directories are fully modifyable
while files are not.

The write-once limitation on files may be removed in some future, but probably 
not in the near future.

Yoram

-----Original Message-----
From: Wayne Liu [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, May 29, 2007 4:50 AM
To: [email protected]
Subject: Re: How to use Hadoop's API to write to a file in append mode?

thanks,Mahadev

But as you know, the data we crawl from the web increase little by little,
if hadoop does not support append mode, then we have to create a new file
everytime when we get new data.I do not think it's a good idea.

Reply via email to