Would the second WAL contain the same contents as the first ? We already have the code that adds interceptor on the calls to the namenode#getBlockLocations so that blocks on the same DN as the dead RS are placed at the end of the priority queue.. See addLocationsOrderInterceptor() in hbase-server/src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java
This is for faster recovery in case regionserver is deployed on the same box as the datanode. On Tue, Apr 15, 2014 at 1:43 PM, Claudiu Soroiu <[email protected]> wrote: > First of all, thanks for the clarifications. > > **how about 300 regions with 3x replication? Or 1000 regions? This > is going to be 3000 files. on HDFS. per one RS.** > > Now i see that the trade-off is how to reduce the recovery time without > affecting the overall performance of the cluster. > Having too many WAL's affects the write performance. > Basically multiple WAL's might improve the process but the number of WAL's > should be relatively small. > > Would it be feasible to know ahead of time where a region might activate in > case of a failure and have for each region server a second WAL file > containing backup edits? > E.g. If machine B crashes then a region will be assigned to node A, one to > node C, etc. > Also another view would be: Server A will backup a region from Server B if > crashes, a region from server C, etc. Basically this second WAL will > contain the data that is needed to fast recover a crashed node. > This adds additional redundancy and some degree of complexity to the > solution but ensures data locality in case of a crash and faster recovery. > > **What did you do Claudiu to get the time down?** > > Decreased the hdfs block size to 64 megs for now. > Enabled settings to avoid hdfs stale nodes. > Cluster I tested this was relatively small - 10 computers. > I did tuning for zookeeper sessions to keep the heartbeat at 5 seconds for > the moment, and plan to decrease this value. > At this point dfs.heartbeat.interval is set at the default 3 seconds, but > this I also plan to decrease and perform a more intensive test. > (Decreasing the times is based on the experience with our current system > configured at 1.2 seconds and didn't had any issues even under heavy loads, > obviously stop the world GC times should be smaller that the heartbeat > interval) > And I remember i did some changes for the reconnect intervals of the > client to allow him to reconnect to the region as fast as possible. > I am in an early stage of experimenting with hbase but there are lot of > things to test/check... > > > > > On Tue, Apr 15, 2014 at 11:03 PM, Vladimir Rodionov > <[email protected]>wrote: > > > *We also had a global HDFS file limit to contend with* > > > > Yes, we have been seeing this from time to time in our production > clusters. > > Periodic purging of old files helps, but the issue is obvious. > > > > -Vladimir Rodionov > > > > > > On Tue, Apr 15, 2014 at 11:58 AM, Stack <[email protected]> wrote: > > > > > On Mon, Apr 14, 2014 at 1:47 PM, Claudiu Soroiu <[email protected]> > > wrote: > > > > > > > .... > > > > > > After some tunning I managed to > > > > reduce it to 8 seconds in total and for the moment it fits the needs. > > > > > > > > > > What did you do Claudiu to get the time down? > > > Thanks, > > > St.Ack > > > > > >
