Not quite an advance developer, but I learnt some shortcuts for my dev cycle along the way.
> I've checked out Hadoop, made minor changes and built it with Maven, and > tracked down the resulting artifacts in a target/ directory that I could > deploy. Is this typically how a cloudera/hortonworks/mapr/etc dev works, or > are the IDEs more common? I mostly stuck to vim for my editor, with a few exceptions (Eclipse is great for browsing class to class) & mvn eclipse:eclipse works great. I end up doing mvn package -Pdist That give you a hadoop-dist/target/hadoop-${version} to work from >From then on, the mini-cluster is your friend. http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CLIMiniCluster.html all I usually specify is -rmport 8032 Next thing I learnt was that for most of dev work, file:/// works great instead of hdfs for instance in hive, I could just give -hiveconf fs.default.name=file://$(FS)/ -hiveconf hive.metastore.warehouse.dir=file://$(FS)/warehouse (of course, substituting FS for something useful like /tmp/hive/) and run my queries without worrying about HDFS overheads. Using file:/// urls for map input and output occasionally simplifies your debugging a lot. So basically, you could run ./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar wordcount file:///usr/share/dict/words file:///tmp/run1 Or you could just use localhost:9000 in the minicluster if you really want to test out the HDFS client ops. Figuring out how to run hadoop in the non-cluster mode has been the most produtivity inducing thing I've learnt. Hope that helps. > I realize this sort of sounds like a dumb question, but I'm mostly curious > what I might be missing out on if I stay away from anything other than vim, > and not being entirely sure where maven might be caching jars that it uses > to build, and how careful I have to be to ensure that my changes wind up in > the right places without having to do a clean build every time. find ~/.m2/ helps a bit, but occasionally when I do break the API of something basic like Writable, I want to use my version of the hadoop libs for that project. So, this is a question I have for everyone else. How do I change the hadoop version of an entire build, so that I can name it something unique & use it in other builds in maven (-SNAPSHOT doesn't cut it, since occasionally mvn will download the hadoop snap poms from the remote repos). Cheers, Gopal