Hi, I have been trying crawling data using MapReduce on HBase. Here is the scenario:
1) I have a Fetch list which has all the permalinks to be fetched .They are stored in a PermalinkTable 2) A MapReduce scans over each permalink and tries fetching for the data and dumping it in ContentTable. Here are the issues I face: The permalink table is not split so I have just one map running on a single machine. The use of mapreduce gets nullified. The map reduce keeps givinf scanner time our exceptions causing task failures and further delays. If any one can give me tips for this use case it would really help me.
