Do you remember the "Caching frequently map input files" thread? http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200802.mbox/[EMAIL PROTECTED]
On Mon, Apr 21, 2008 at 8:31 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: > > This is kind of odd that you are doing this. It really sounds like a > replication of what hadoop is doing. > > Why not just run a map process and have hadoop figure out which blocks are > where? > > Can you say more about *why* you are doing this, not just what you are > trying to do? > > On 4/21/08 10:28 AM, "Shimi K" <[EMAIL PROTECTED]> wrote: > > > I am using Hadoop HDFS as a distributed file system. On each DFS node I > have > > another process which needs to read the local HDFS files. > > Right now I'm calling the NameNode in order to get the list of all the > files > > in the cluster. For each file I check if it is a local file (one of the > > locations is the host of the node), if it is I read it. > > Disadvantages: > > * This solution works only if the entire file is not split. > > * It involves the NameNode. > > * Each node needs to iterate on all the files in the cluster. > > > > There must be a better way to do it. The perfect way will be to call the > > DataNode and to get a list of the local files and their blocks. > > > > On Mon, Apr 21, 2008 at 7:18 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: > > > >> > >> Datanodes don't necessarily contain complete files. It is possible to > >> enumerate all files and to find out which datanodes host different > blocks > >> from these files. > >> > >> What did you need to do? > >> > >> > >> On 4/21/08 2:11 AM, "Shimi K" <[EMAIL PROTECTED]> wrote: > >> > >>> Is there a way to get the list of files on each datanode? > >>> I need to be able to get all the names of the files on a specific > >> datanode? > >>> is there a way to do it? > >> > >> > >
