inverted index for crawled webdata

Zhijun Dai Thu, 22 May 2008 02:44:32 -0700

Hello Friend,

I have a question on how to write mapreduce job to build inverted index for
crawled webdata.


My problem is: if I store one page in one file, file-id can easily got, but
I am afriad if crawled billions of pages will have a problem for the hadoop
storage system.

If I store all pages in a big file, then how to get the file-id during the
map-reduce job?

Thanks in advance!

Regards

Zhijun

inverted index for crawled webdata

Reply via email to