Thanks a lot! If I do it via the NN can I get down to the block level? On Jul 7, 2012 4:51 AM, "Harsh J" <ha...@cloudera.com> wrote:
> The DNs do not expose the mapping they maintain, to clients. So it has > to be either routed through the NN (for which I've provided the few > commands), or you may otherwise disregard protocols and just enter the > directory yourself. > > Note that block backups may not always make sense, since, for example, > there are chances that some blocks are no longer belonging to any > active file on an NN since the deletion progresses in small phases and > it could be some time before a raw block at a DN is invalidated (and > then deleted). > > On Sat, Jul 7, 2012 at 12:28 AM, Yaron Gonen <yaron.go...@gmail.com> > wrote: > > Thanks, I'll look at that tool. > > I still wish to iterate the blocks from the Java interface since I want > to > > look at their metadata. I'll look at the source code of the command line > > tools you mentioned. > > > > Thanks again. > > > > On Jul 6, 2012 9:07 PM, "Harsh J" <ha...@cloudera.com> wrote: > >> > >> Does HDFS's replication feature not do this automatically and more > >> effectively for you? > >> > >> I think for backups you should look at the DistCp tool, which backup > >> at proper file-levels rather than granular block level copies. It can > >> do incremental copies too, AFAICT. > >> > >> In any case, if you wish to have a list of all blocks at each DN, > >> either parse out the info returned via "dfsadmin -metasave", "fsck > >> -files -blocks -locations", or ls -lR the DN's data dir. > >> > >> On Fri, Jul 6, 2012 at 11:23 PM, Yaron Gonen <yaron.go...@gmail.com> > >> wrote: > >> > Thanks for the fast reply. > >> > My top goal is to backup any new blocks on the DN. > >> > What i'd like to do is to go over all the blocks in the DN and to > make a > >> > signature for any one of them. I'll compare that signature with a > backup > >> > server. > >> > I guess another feature will be to check only new blocks, so i'll have > >> > to > >> > look at the metadata of each block. > >> > > >> > On Jul 6, 2012 5:59 PM, "Harsh J" <ha...@cloudera.com> wrote: > >> >> > >> >> When you say 'scan blocks on that datanode', what do you mean to do > by > >> >> 'scan'? If you want merely a list of blocks per DN at a given time, > >> >> there are ways to get that. However, if you want to then perform > >> >> operations on each of these block remotely, then thats not possible > to > >> >> do. > >> >> > >> >> In any case, you can run whatever program you wish to agnostically on > >> >> any DN by running it on the dfs.datanode.data.dir directories of the > >> >> DN (take it from its config), and visiting all files with the format > >> >> ^blk_<ID number>$. > >> >> > >> >> We can help you better if you tell us what exactly are you attempting > >> >> to do, for which you need a list of all the blocks per DN. > >> >> > >> >> On Fri, Jul 6, 2012 at 7:58 PM, Yaron Gonen <yaron.go...@gmail.com> > >> >> wrote: > >> >> > Hi, > >> >> > I'm trying to write an agent that will run on a datanode and will > >> >> > scan > >> >> > blocks on a that datanode. > >> >> > The logical thing to do is to look in the DataBlockScanner code, > >> >> > which > >> >> > lists > >> >> > all the blocks on a node, which is what I did. > >> >> > The problem is that the DataBlockScanner object is instantiated > >> >> > during > >> >> > the > >> >> > start-up of a DataNode, so a lot of objects needed (like FSDataSet) > >> >> > are > >> >> > already instantiated. > >> >> > Then, I tried with DataNode.getDataNode(), but it returned null > >> >> > (needless to > >> >> > say that the node is up-and-running). > >> >> > I'd be grateful if you can refer me to the right object or to a a > >> >> > guide. > >> >> > > >> >> > I'm new in hdfs, so I'm sorry if its a trivial question. > >> >> > > >> >> > Thanks, > >> >> > Yaron > >> >> > >> >> > >> >> > >> >> -- > >> >> Harsh J > >> > >> > >> > >> -- > >> Harsh J > > > > -- > Harsh J >