Thanks, I'll look at that tool. I still wish to iterate the blocks from the Java interface since I want to look at their metadata. I'll look at the source code of the command line tools you mentioned.
Thanks again. On Jul 6, 2012 9:07 PM, "Harsh J" <ha...@cloudera.com> wrote: > Does HDFS's replication feature not do this automatically and more > effectively for you? > > I think for backups you should look at the DistCp tool, which backup > at proper file-levels rather than granular block level copies. It can > do incremental copies too, AFAICT. > > In any case, if you wish to have a list of all blocks at each DN, > either parse out the info returned via "dfsadmin -metasave", "fsck > -files -blocks -locations", or ls -lR the DN's data dir. > > On Fri, Jul 6, 2012 at 11:23 PM, Yaron Gonen <yaron.go...@gmail.com> > wrote: > > Thanks for the fast reply. > > My top goal is to backup any new blocks on the DN. > > What i'd like to do is to go over all the blocks in the DN and to make a > > signature for any one of them. I'll compare that signature with a backup > > server. > > I guess another feature will be to check only new blocks, so i'll have to > > look at the metadata of each block. > > > > On Jul 6, 2012 5:59 PM, "Harsh J" <ha...@cloudera.com> wrote: > >> > >> When you say 'scan blocks on that datanode', what do you mean to do by > >> 'scan'? If you want merely a list of blocks per DN at a given time, > >> there are ways to get that. However, if you want to then perform > >> operations on each of these block remotely, then thats not possible to > >> do. > >> > >> In any case, you can run whatever program you wish to agnostically on > >> any DN by running it on the dfs.datanode.data.dir directories of the > >> DN (take it from its config), and visiting all files with the format > >> ^blk_<ID number>$. > >> > >> We can help you better if you tell us what exactly are you attempting > >> to do, for which you need a list of all the blocks per DN. > >> > >> On Fri, Jul 6, 2012 at 7:58 PM, Yaron Gonen <yaron.go...@gmail.com> > wrote: > >> > Hi, > >> > I'm trying to write an agent that will run on a datanode and will scan > >> > blocks on a that datanode. > >> > The logical thing to do is to look in the DataBlockScanner code, which > >> > lists > >> > all the blocks on a node, which is what I did. > >> > The problem is that the DataBlockScanner object is instantiated during > >> > the > >> > start-up of a DataNode, so a lot of objects needed (like FSDataSet) > are > >> > already instantiated. > >> > Then, I tried with DataNode.getDataNode(), but it returned null > >> > (needless to > >> > say that the node is up-and-running). > >> > I'd be grateful if you can refer me to the right object or to a a > guide. > >> > > >> > I'm new in hdfs, so I'm sorry if its a trivial question. > >> > > >> > Thanks, > >> > Yaron > >> > >> > >> > >> -- > >> Harsh J > > > > -- > Harsh J >