My understanding of hdfs is limited but I believe it's > DFS will use (total disk size - 10 GB) and not > always leave 10 GB free?
Datanode simply does 'df' - reserve_space(10G) and *use* up to that amount. Koji On 6/9/11 10:13 AM, "Harsh J" <[email protected]> wrote: > Landy, > > On Thu, Jun 9, 2011 at 10:05 PM, Bible, Landy <[email protected]> wrote: >> Hi all, >> >> I'm planning a rather non-standard HDFS cluster. The machines will be doing >> more than just DFS, and each machine will have varying local storage >> utilization outside of DFS. If I use the "dfs.datanode.du.reserved" property >> and reserve 10 GB, Does that mean DFS will use (total disk size - 10 GB) or >> that it will always leave 10 GB free? Basically, is the disk usage outside >> DFS (OS + other data) taken in to account? > > The latter (will leave 10 GB free). The whole disk is taken into > account during space compute. So yes, even external data may > influence. > >> As usage outside of DFS grows I'd like DFS to back off the disk, and migrate >> blocks to other nodes. If this isn't the current behavior, I could create a >> script to look at disk usage every few hours and modify the reserved property >> dynamically. If the property is changed on a single datanode and it is >> restarted, will the datanode then start moving blocks away? > > Why would you need to modify the reserve values once set to a > comfortable value? The DN monitors the disk space by itself, so you > don't have to. > > The DN will also not move away blocks if reserved limit is violated > (due to you increasing it, say). However, it will begin to refuse any > writes happening to it. You may require to run the Balancer in order > to move blocks around and balance DNs though. > >> My other option is to just set the reserved amount very high on every node, >> but that will lead to a lot of wasted space as many nodes won't have a very >> large storage demand outside of DFS. > > How about keeping one disk dedicated for all other intents outside of > the DFS's grasp?
