Hi, I am confused a bit. What is the difference if I use "hadoop distcp" to upload files? I assume "hadoop distcp" using multiple trackers to upload files in parallel.
Thanks, Rui ----- Original Message ---- From: Ted Dunning <[EMAIL PROTECTED]> To: [email protected] Sent: Thursday, December 20, 2007 6:01:50 PM Subject: Re: DFS Block Allocation On 12/20/07 5:52 PM, "C G" <[EMAIL PROTECTED]> wrote: > Ted, when you say "copy in the distro" do you need to include the > configuration files from the running grid? You don't need to actually start > HDFS on this node do you? You are correct. You only need the config files (and the hadoop script helps make things easier). > If I'm following this approach correctly, I would want to have an "xfer > server" whose job it is to essentially run dfs -copyFromLocal on all > inbound-to-HDFS data. Once I'm certain that my data has copied correctly, I > can delete the local files on the xfer server. Yes. > This is great news, as my current system wastes a lot of time copying data > from data acquisition servers to the master node. If I can copy to HDFS > directly from ny acquisition servers then I am a happy guy.... You are a happy guy. If your acquisition systems can see all of your datanodes. ____________________________________________________________________________________ Never miss a thing. Make Yahoo your home page. http://www.yahoo.com/r/hs
