Re: Unknown Scanner Exception

Dru Jensen Tue, 12 Aug 2008 16:17:29 -0700

Hi Andy and St.Ack,

I'd be interested to hear if logging turns up anything.

Table commits have sub-second response times. It looks like crawlingis causing the slowness.

Inside the map task definitely. Job failure at the map stage
would force you to redo anything that might be in the collector.

I am putting data in the same row and column family as i am scanning.According the St.Ack's response, I need to put the data in a separatecolumn family. I will see if this helps. I'm curious, does thecommit write the data to the same region as the map task is scanning?Is this what may cause contention?

In general crawling, especially if you are recursively following
links (are you?), can take a long time... Often remote servers
are quite slow. I set a socket and connection timeout for
commons-httpclient and retry.

I am lucky, I do not need to recurse links. Can you disclose whatsettings you are using for the commons-httpclient?

Another possibility is to run a MR job ahead of time to build
a worklist in DFS and avoid use of TableMap entirely. This
would also allow you to split the work into more maps.

I may end up doing this if I find that the number of MR tasks is notsufficient.

Is there a way to split the regions before the MR task runs? I knowit is going to write ~2K per row, is there a way to tell HBase to goahead and split based on this anticipated size?


thanks,
Dru

Re: Unknown Scanner Exception

Reply via email to