Re: Hadoop on windows with bat and ant scripts

Steve Loughran Tue, 14 Jun 2011 04:00:43 -0700

On 13/06/11 15:27, Bible, Landy wrote:

On 06/13/2011 07:52 AM, Loughran, Steve wrote:

On 06/10/2011 03:23 PM, Bible, Landy wrote:
I'm currently running HDFS on Windows 7 desktops.  I had to create a hadoop.bat 
that provided the same functionality of the shell scripts, and some Java 
Service Wrapper configs to run the DataNodes and NameNode as windows services.  
Once I get my system more functional I plan to do a write up about how I did 
it, but it wasn't too difficult.  I'd also like to see Hadoop become less 
platform dependent.

why? Do you plan to bring up a real Windows server datacenter to test it on?


Not a datacenter, but a large-ish cluster of desktops, yes.

Whether you like it or not, all the big Hadoop clusters run on Linux


I realize that, I use Linux wherever possible, much to the annoyance of my 
Windows only co-workers. However, for my current project, I'm using all the 
Windows 7 and Vista desktops at my site as a storage cluster.   The first idea 
was to run Hadoop on Linux in a VM in the background on each desktop, but that 
seemed like overkill.  The point here is to use the resources we have but 
aren't using, rather than buy new resources.  Academia is funny like that.

I understand. One trick my local university has done is to buy a set ofservers with HDDs for their HDFS filestore, but also hook them up totheir grid scheduler (condor? Torque?) so the existing grid jobs see aset of machines for their work, while the Job tracker sees a farm ofworker nodes with local data. Some more work there on reportingbusy-state to each job scheduler would be nice, so that the TaskTrackers would say "busy" when running grid jobs, and vice-versa

   So far, I've been unable to make MapReduce work correctly.  The services 
run, but things don't work, however I suspect that this is due to DNS not 
working correctly in my environment.

yes, that's part of the anywhere you have to fix. Edit the host tables so that 
DNS and reverse DNS appears to work. That's 
c:\windows\system32\drivers\etc\hosts, unless on a win64 box it moves.


Why does Hadoop even care about DNS?   Every node checks in with the NameNode 
and JobTrackers, so they know where they are, why not just go pure IP based and 
forget DNS.   Managing the hosts file is a pain... even when you automate it, 
it just seems unneeded.

there's been some fixes in 0.21 and 0.22, but still there may be atendency to look things up.


https://issues.apache.org/jira/browse/HADOOP-3426
https://issues.apache.org/jira/browse/HADOOP-7104

Hadoop doesn't like coming up on multi-homed servers or having separatein-cluster and long-haul hostnames. Yes, this all needs fixing. I thinkthe reason it hasn't been fixed is that the big datacentres do have wellconfigured networks, caching DNS servers in every worker node, etc, andall is well. It's the home networks and the less-consistently set upones (mine, and perhaps yours) where the trouble shows up. We get tofile the bugs and fix the problems.

Re: Hadoop on windows with bat and ant scripts

Reply via email to