Bob, welcome and great to hear from you! Please feel free to contribute in any form (email, discussion, JIRAs, etc., code), and we appreciate you here being part of the community.
Cheers, Chris On 12/18/12 6:48 PM, "Bob Kerns" <[email protected]> wrote: >Hi! > >I've been looking into the hadoop tool situation for FICO, and had pretty >much reached the conclusion that we were going to need to contribute to >bring it up to snuff, and that it really needed to be split off from the >main Hadoop project. > >Imagine my surprise, when I started looking into who was involved, and how >to start a discussion, and found that Adam had already taken the >initiative >and gotten the ball rolling! > >Anyway, 1+ to the general idea, and here's one more contributor. > >So let me introduce myself. I've been writing software since about 1970, >back in the days of batch processing and punch cards. It feels a bit like >coming full circle, though of course, many never left the batch-processing >world. > >At MIT in the 1970's, was a MacLisp maintainer and a developer on the >Macsyma symbolic algebra system. At Symbolics, I was a developer, and for >a >time, I managed the software maintenance / release team. I've worked for >DEC and tiny startups, and collaborated on small open-source projects >around the world. I've done networking stacks from the drivers up, more AI >rule engines than I can count, UI, web apps (server side and AJAX), >Eclipse >plugins and RCP apps, and everything from little Android apps to giant >enterprise tools. > >But one thing I haven't done is work on large, highly structured >open-source projects, and Apache in particular. I do have a rough idea of >the process and culture -- but I'm sure there are rough edges that need >knocking off. :) > >I'm also a newcomer to Hadoop, but I've been working hard on rectifying >that over the past month. I have set up a 12-node cluster set up at home, >and my wife as a built-in user community (she does movie special effects). > >But my immediate concerns with the Eclipse plugin are to meet the needs of >end users, many of whom will be more concerned with working with the data >in conjunction with our Eclipse-based products. > >Coming from that background, some pain points I'd like to address when we >get underway: > >* Configuration and setup is too hard. It would help to be able to import >configuration files directly, for example. > >* Exploring large datasets is likely to be important to our users -- but >opening a large HDFS file kills Eclipse dead. We need to be able to >explore >without loading the entire file into an Eclipse buffer! I think it would >also help if the tools better showed how the tasks will see the data, as >well as handle the various file formats (mapfiles, Avro- and >Writable-formatted data, etc.). > >* Interfacing to more aspects of the overall Hadoop ecosystem. > >* Compatibility with different versions of Hadoop and related tools. >Interfacing with Hadoop 3.0 or CHD5 should require no more than adding a >new driver plugin. > >I look forward to working with you!
