Thanks a lot!
If I do it via the NN can I get down to the block level?
On Jul 7, 2012 4:51 AM, "Harsh J" <ha...@cloudera.com> wrote:

> The DNs do not expose the mapping they maintain, to clients. So it has
> to be either routed through the NN (for which I've provided the few
> commands), or you may otherwise disregard protocols and just enter the
> directory yourself.
>
> Note that block backups may not always make sense, since, for example,
> there are chances that some blocks are no longer belonging to any
> active file on an NN since the deletion progresses in small phases and
> it could be some time before a raw block at a DN is invalidated (and
> then deleted).
>
> On Sat, Jul 7, 2012 at 12:28 AM, Yaron Gonen <yaron.go...@gmail.com>
> wrote:
> > Thanks, I'll look at that tool.
> > I still wish to iterate the blocks from the Java interface since I want
> to
> > look at their metadata. I'll look at the source code of the command line
> > tools you mentioned.
> >
> > Thanks again.
> >
> > On Jul 6, 2012 9:07 PM, "Harsh J" <ha...@cloudera.com> wrote:
> >>
> >> Does HDFS's replication feature not do this automatically and more
> >> effectively for you?
> >>
> >> I think for backups you should look at the DistCp tool, which backup
> >> at proper file-levels rather than granular block level copies. It can
> >> do incremental copies too, AFAICT.
> >>
> >> In any case, if you wish to have a list of all blocks at each DN,
> >> either parse out the info returned via "dfsadmin -metasave", "fsck
> >> -files -blocks -locations", or ls -lR the DN's data dir.
> >>
> >> On Fri, Jul 6, 2012 at 11:23 PM, Yaron Gonen <yaron.go...@gmail.com>
> >> wrote:
> >> > Thanks for the fast reply.
> >> > My top goal is to backup any new blocks on the DN.
> >> > What i'd like to do is to go over all the blocks in the DN and to
> make a
> >> > signature for any one of them. I'll compare that signature with a
> backup
> >> > server.
> >> > I guess another feature will be to check only new blocks, so i'll have
> >> > to
> >> > look at the metadata of each block.
> >> >
> >> > On Jul 6, 2012 5:59 PM, "Harsh J" <ha...@cloudera.com> wrote:
> >> >>
> >> >> When you say 'scan blocks on that datanode', what do you mean to do
> by
> >> >> 'scan'? If you want merely a list of blocks per DN at a given time,
> >> >> there are ways to get that. However, if you want to then perform
> >> >> operations on each of these block remotely, then thats not possible
> to
> >> >> do.
> >> >>
> >> >> In any case, you can run whatever program you wish to agnostically on
> >> >> any DN by running it on the dfs.datanode.data.dir directories of the
> >> >> DN (take it from its config), and visiting all files with the format
> >> >> ^blk_<ID number>$.
> >> >>
> >> >> We can help you better if you tell us what exactly are you attempting
> >> >> to do, for which you need a list of all the blocks per DN.
> >> >>
> >> >> On Fri, Jul 6, 2012 at 7:58 PM, Yaron Gonen <yaron.go...@gmail.com>
> >> >> wrote:
> >> >> > Hi,
> >> >> > I'm trying to write an agent that will run on a datanode and will
> >> >> > scan
> >> >> > blocks on a that datanode.
> >> >> > The logical thing to do is to look in the DataBlockScanner code,
> >> >> > which
> >> >> > lists
> >> >> > all the blocks on a node, which is what I did.
> >> >> > The problem is that the DataBlockScanner object is instantiated
> >> >> > during
> >> >> > the
> >> >> > start-up of a DataNode, so a lot of objects needed (like FSDataSet)
> >> >> > are
> >> >> > already instantiated.
> >> >> > Then, I tried with DataNode.getDataNode(), but it returned null
> >> >> > (needless to
> >> >> > say that the node is up-and-running).
> >> >> > I'd be grateful if you can refer me to the right object or to a a
> >> >> > guide.
> >> >> >
> >> >> > I'm new in hdfs, so I'm sorry if its a trivial question.
> >> >> >
> >> >> > Thanks,
> >> >> > Yaron
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Harsh J
> >>
> >>
> >>
> >> --
> >> Harsh J
>
>
>
> --
> Harsh J
>

Reply via email to