This does impose some limitations. Not so much on crawling, but on making the information available. You'd crawl, writing to a file until it's large enough or until some time has elapsed or something, then you'd close the file and start a new one. At that time the file you closed becomes available to applications to consume, and it also becomes immutable. If you want something like a daily file, you can basically set up a daily directory instead, since directories are fully modifyable while files are not.
The write-once limitation on files may be removed in some future, but probably not in the near future. Yoram -----Original Message----- From: Wayne Liu [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 29, 2007 4:50 AM To: [email protected] Subject: Re: How to use Hadoop's API to write to a file in append mode? thanks,Mahadev But as you know, the data we crawl from the web increase little by little, if hadoop does not support append mode, then we have to create a new file everytime when we get new data.I do not think it's a good idea.
