[ 
https://issues.apache.org/jira/browse/HBASE-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Evgeny Ryabitskiy reassigned HBASE-1084:
----------------------------------------

    Assignee:     (was: Evgeny Ryabitskiy)

> Reinitializable DFS client
> --------------------------
>
>                 Key: HBASE-1084
>                 URL: https://issues.apache.org/jira/browse/HBASE-1084
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: io, master, regionserver
>            Reporter: Andrew Purtell
>             Fix For: 0.20.0
>
>
> HBase is the only long lived DFS client. Tasks handle DFS errors by dying. 
> HBase daemons do not and instead depend on dfsclient error recovery 
> capability, but that is not sufficiently developed or tested. Several issues 
> are a result:
> * HBASE-846: hbase looses its mind when hdfs fills
> * HBASE-879: When dfs restarts or moves blocks around, hbase regionservers 
> don't notice
> * HBASE-932: Regionserver restart
> * HBASE-1078: "java.io.IOException: Could not obtain block": allthough file 
> is there and accessible through the dfs client
> * hlog indefinitely hung on getting new blocks from dfs on apurtell cluster
> * regions closed due to transient DFS problems during loaded cluster restart
> These issues might also be related:
> * HBASE-15: Could not complete hdfs write out to flush file forcing 
> regionserver restart
> * HBASE-667: Hung regionserver; hung on hdfs: writeChunk, 
> DFSClient.java:2126, DataStreamer socketWrite
> HBase should reinitialize the fs a few times upon catching fs exceptions, 
> with backoff, to compensate. This can be done by making a wrapper around all 
> fs operations that releases references to the old fs instance and makes and 
> initializes a new instance to retry. All fs users would need to be fixed up 
> to handle loss of state around fs wrapper invocations: hlog, memcache 
> flusher, hstore, etc. 
> Cases of clear unrecoverable failure (are there any?) should be excepted.
> Once the fs wrapper is in place, error recovery scenarios can be tested by 
> forcing reinitialization of the fs during PE or other test cases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to