[jira] [Commented] (HBASE-5757) TableInputFormat should handle as many errors as possible

Jonathan Hsieh (JIRA) Wed, 09 May 2012 11:42:15 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271681#comment-13271681
 ]


Jonathan Hsieh commented on HBASE-5757:
---------------------------------------

Got it, great clarification on the DNRIOExn.  Can you add this in the comments 
of the catch block in TableInputFormat?  if it passes tests than I'll commit.  
If you could add a hadoop counter that be awesome (or file a jira to add one). 

I have a feeling there might be a configuration work around.  Are you using 
scanner caching at all on your client?  (default is no caching). Seems like 
there would be a sweet spot above witch  there is diminishing returns.   It 
sounds like in your case your rows may be variably sized making this difficult. 
 

Note that we've been able to can set scanner caching on each individual scan in 
since 0.20 (HBASE-1759) -- setting it for that job may be more 'correct'. 

Also it looks like some of this code could go for a cleanup -- HBASE-2161 is 
another jira that says ScannerTimeoutException may be cruft -- why is it 
separate from LeaseException? (possibly related to ).  I think I would prefer 
if we explicitly call out the exceptions (UnknownScannerException, 
LeaseException and ScannerTimeoutException) that we retry on and leave out the 
rest to be rethrown (there was a recent thread dicussing IOException abuse).  


                
> TableInputFormat should handle as many errors as possible
> ---------------------------------------------------------
>
>                 Key: HBASE-5757
>                 URL: https://issues.apache.org/jira/browse/HBASE-5757
>             Project: HBase
>          Issue Type: Bug
>          Components: mapred, mapreduce
>    Affects Versions: 0.90.6
>            Reporter: Jan Lukavsky
>         Attachments: HBASE-5757.patch
>
>
> Prior to HBASE-4196 there was different handling of IOExceptions thrown from 
> scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this 
> handling so that if exception is caught a reconnect is attempted (without 
> bothering the mapred client). After that, HBASE-4269 changed this behavior 
> back, but in both mapred and mapreduce APIs. The question is, is there any 
> reason not to handle all errors that the input format can handle? In other 
> words, why not try to reissue the request after *any* IOException? I see the 
> following disadvantages of current approach
>  * the client may see exceptions like LeaseException and 
> ScannerTimeoutException if he fails to process all fetched data in timeout
>  * to avoid ScannerTimeoutException the client must raise 
> hbase.regionserver.lease.period
>  * timeouts for tasks is aready configured in mapred.task.timeout, so this 
> seems to me a bit redundant, because typically one needs to update both these 
> parameters
>  * I don't see any possibility to get rid of LeaseException (this is 
> configured on server side)
> I think all of these issues would be gone, if the DoNotRetryIOException would 
> not be rethrown. -On the other hand, handling errors in InputFormat has 
> disadvantage, that it may hide from the user some inefficiency. Eg. if I have 
> very big scanner.caching, and I manage to process only a few rows in timeout, 
> I will end up with single row being fetched many times (and will not be 
> explicitly notified about this). Could we solve this problem by adding some 
> counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5757) TableInputFormat should handle as many errors as possible

Reply via email to