Hi all, I finally got a chance to have a look at what's been happening so far. I'm not familiar with eclipse plugin development so I'm just going to spend some time trying to fix some of Bob's jiras.
There's been a little discussion on the feature roadmap, but no resolution yet. I share Bob's vision on where HDT should move to, at http://wiki.apache.org/hdt/HDTProductExperience. I also like Adam's idea of an early release, even if it's limited to MR job development and HDFS browsing for 0.23 and 2.x. I think it would provide a fast feedback loop that shows us whether we're on the right track with the small increments in functionality we've added. A little story to support this approach. The platform I set up at my last job was a single kerberos secured cluster running 0.20.205.0, used by researchers from lots of different institutes. Some work together, others don't, and all connect from within their own networks from their own machines - a true multi-tenant service with lots of heterogeneity in client (lap- & desktop) configurations. I gave my users ant targets for submitting their jobs from within Eclipse, which was a simple but effective way for them to run MR jobs against the cluster. With the ant targets I gave them a pre-configured Hadoop release and a krb5.ini. I also helped them export JAVA_HOME, put the kerberos config in the right place, and set up FireFox or Chrome for accessing the SPNEGO secured web interfaces of the JT and the NN. This very basic setup worked well enough, but required hands-on support quite a lot, due to conflicts with existing Hadoop installations, environment variables, Kerberos configs, network, and so on. Most of Hadoop is platform independent, but, at least in 0.20.x, the devil is in the details. Dealing with the heterogeneity on the client side was not easy, and then I only supported Linux and OS X (although eventually I did manage to get everything to work on Windows as well). It is going to be a challenge getting even basic functionality to work across clients _and_ across Hadoop installations and configurations. I'm not saying this to temper enthusiasm, I'm just trying to argue that we should work in baby steps and get as much feedback as possible, as soon as possible. Anyway, just thought I'd share the experience. I'm going to dig through the code a bit. I should be able to put some effort in during the remainder of March. Although, probably not next week - any of you guys going to Hadoop Summit here in my hometown? Evert On Thu, Jan 17, 2013 at 9:10 PM, Adam Berry <[email protected]> wrote: > Hello all, > > I thought while I'm making progress on the initial split of the code, that > we could take a moment to talk about a rough early outline. > > So in the original project proposal, we laid out the initial goals of the > project (http://wiki.apache.org/incubator/HadoopDevelopmentToolsProposal). > > Basically right now the features are MapReduce development (wizards, hadoop > launches) and HDFS access, so getting those working with multiple versions > of hadoop would be the first target. I think we could make this happen, > including documentation and tests in the next few months, by end of Q1 > would be a nice (yes, its also aggressive) thing to shoot for. > > With a release in hand we can target various places to grow our visibility > (as Bob brought up) and hence grow the community. At that point I think we > will start to feel where to go next, things like Pig are attractive targets > for tools, but as we drive and build the community the direction will > become clearer. > > So what else would people like to throw into the ring here? > > Cheers, > Adam >
