Do you remember the "Caching frequently map input files" thread?
http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200802.mbox/[EMAIL 
PROTECTED]


On Mon, Apr 21, 2008 at 8:31 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:

>
> This is kind of odd that you are doing this.  It really sounds like a
> replication of what hadoop is doing.
>
> Why not just run a map process and have hadoop figure out which blocks are
> where?
>
> Can you say more about *why* you are doing this, not just what you are
> trying to do?
>
> On 4/21/08 10:28 AM, "Shimi K" <[EMAIL PROTECTED]> wrote:
>
> > I am using Hadoop HDFS as a distributed file system. On each DFS node I
> have
> > another process which needs to read the local HDFS files.
> > Right now I'm calling the NameNode in order to get the list of all the
> files
> > in the cluster. For each file I check if it is a local file (one of the
> > locations is the host of the node), if it is I read it.
> > Disadvantages:
> > * This solution works only if the entire file is not split.
> > * It involves the NameNode.
> > * Each node needs to iterate on all the files in the cluster.
> >
> > There must be a better way to do it. The perfect way will be to call the
> > DataNode and to get a list of the local files and their blocks.
> >
> > On Mon, Apr 21, 2008 at 7:18 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:
> >
> >>
> >> Datanodes don't necessarily contain complete files.  It is possible to
> >> enumerate all files and to find out which datanodes host different
> blocks
> >> from these files.
> >>
> >> What did you need to do?
> >>
> >>
> >> On 4/21/08 2:11 AM, "Shimi K" <[EMAIL PROTECTED]> wrote:
> >>
> >>> Is there a way to get the list of files on each datanode?
> >>> I need to be able to get all the names of the files on a specific
> >> datanode?
> >>> is there a way to do it?
> >>
> >>
>
>

Reply via email to