Hi Bob! Welcome on board!
On Tue, Dec 18, 2012 at 6:48 PM, Bob Kerns <[email protected]> wrote: > So let me introduce myself. I've been writing software since about 1970, > back in the days of batch processing and punch cards. It feels a bit like > coming full circle, though of course, many never left the batch-processing > world. Indeed! > But one thing I haven't done is work on large, highly structured > open-source projects, and Apache in particular. I do have a rough idea of > the process and culture -- but I'm sure there are rough edges that need > knocking off. :) We would absolutely welcome any kinds of contribution -- code, ideas, documentation, testing effort -- you name it. Every bit counts. > I'm also a newcomer to Hadoop, but I've been working hard on rectifying > that over the past month. I have set up a 12-node cluster set up at home, > and my wife as a built-in user community (she does movie special effects). If you're looking for a pretty polished experience of how to setup hadoop clusters and you want a 100% ASF-driven hadoop distro -- take a look at Apache Bigtop. We've just had a 0.5.0 release and you can install our convenience binary artifacts by simply dropping .list/.repo file into your package manager set of sources: http://archive.apache.org/dist/bigtop/bigtop-0.5.0/repos/ > * Configuration and setup is too hard. It would help to be able to import > configuration files directly, for example. Are you talking about Hadoop? If so -- that's exactly what Bigtop aims at addressing. > * Exploring large datasets is likely to be important to our users -- but > opening a large HDFS file kills Eclipse dead. We need to be able to explore > without loading the entire file into an Eclipse buffer! I think it would > also help if the tools better showed how the tasks will see the data, as > well as handle the various file formats (mapfiles, Avro- and > Writable-formatted data, etc.). > > * Interfacing to more aspects of the overall Hadoop ecosystem. > > * Compatibility with different versions of Hadoop and related tools. > Interfacing with Hadoop 3.0 or CHD5 should require no more than adding a > new driver plugin. All good points! Looking forward to working with you. Thanks, Roman.
