[
https://issues.apache.org/jira/browse/HBASE-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Evgeny Ryabitskiy reassigned HBASE-1084:
----------------------------------------
Assignee: (was: Evgeny Ryabitskiy)
> Reinitializable DFS client
> --------------------------
>
> Key: HBASE-1084
> URL: https://issues.apache.org/jira/browse/HBASE-1084
> Project: Hadoop HBase
> Issue Type: Improvement
> Components: io, master, regionserver
> Reporter: Andrew Purtell
> Fix For: 0.20.0
>
>
> HBase is the only long lived DFS client. Tasks handle DFS errors by dying.
> HBase daemons do not and instead depend on dfsclient error recovery
> capability, but that is not sufficiently developed or tested. Several issues
> are a result:
> * HBASE-846: hbase looses its mind when hdfs fills
> * HBASE-879: When dfs restarts or moves blocks around, hbase regionservers
> don't notice
> * HBASE-932: Regionserver restart
> * HBASE-1078: "java.io.IOException: Could not obtain block": allthough file
> is there and accessible through the dfs client
> * hlog indefinitely hung on getting new blocks from dfs on apurtell cluster
> * regions closed due to transient DFS problems during loaded cluster restart
> These issues might also be related:
> * HBASE-15: Could not complete hdfs write out to flush file forcing
> regionserver restart
> * HBASE-667: Hung regionserver; hung on hdfs: writeChunk,
> DFSClient.java:2126, DataStreamer socketWrite
> HBase should reinitialize the fs a few times upon catching fs exceptions,
> with backoff, to compensate. This can be done by making a wrapper around all
> fs operations that releases references to the old fs instance and makes and
> initializes a new instance to retry. All fs users would need to be fixed up
> to handle loss of state around fs wrapper invocations: hlog, memcache
> flusher, hstore, etc.
> Cases of clear unrecoverable failure (are there any?) should be excepted.
> Once the fs wrapper is in place, error recovery scenarios can be tested by
> forcing reinitialization of the fs during PE or other test cases.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.