Ninad: Are you using Nutch crawling? If not, out of interest, why not? Have you seen NUTCH-650 -- it works I believe (jdcryans?).
Your PermalinkTable is small? Has only a few rows? Maybe down the size at which this table splits by changing flush and maximum file size -- see hbase-default.xml. St.Ack On Mon, Apr 20, 2009 at 4:14 AM, Jean-Daniel Cryans <[email protected]>wrote: > Ninad, > > Regards the timeouts, I recently gave a tip in the thread "Tip when > scanning and spending a lot of time on each row" which should solve > your problem. > > Regards your table, you should split it. In the shell, type the > command "tools" to see how to use the "split" command. Issue a couple > of them, waiting a bit between each call. > > J-D > > On Mon, Apr 20, 2009 at 5:49 AM, Ninad Raut <[email protected]> > wrote: > > Hi, > > > > I have been trying crawling data using MapReduce on HBase. Here is the > scenario: > > > > 1) I have a Fetch list which has all the permalinks to be fetched > > .They are stored in a PermalinkTable > > > > 2) A MapReduce scans over each permalink and tries fetching for the > > data and dumping it in ContentTable. > > > > Here are the issues I face: > > > > The permalink table is not split so I have just one map running on a > > single machine. The use of mapreduce gets nullified. > > > > The map reduce keeps givinf scanner time our exceptions causing task > > failures and further delays. > > > > > > If any one can give me tips for this use case it would really help me. > > >
