Hello, my VOTE:
-1 and regarding the vote for retiring the HDT project I would like to suggest to clearly define the scope of the project if it survives after the VOTE. I found it hard to explain to other people what the role of HDT is. Since the Kite-SDK offers dataset centric libraries and Morphlines for reusable "single record" ETL operations I was more focused on this side. But anyway, even if you know the Hadoop Ecosystem, it is not easy to see what are the most often used components. In between I think the Morphlines are great and some tool support for developers and analysts would be great. I created "MorphMiner". It is a tool, which allows editing and testing of Morphlines in an GUI, and this could be a contribution to HDT, but I think, right now, it is not really clear if it is a good fit, as I can not see the overall picture of the HDT vision. What do you think about the role of HDT? It could be the single entry point for developer with an abstract "cluster handling" component. This means, (A) we would have to enable a connection to existing cluster via their manager API, e.g. Cloudera Managers REST API or comparable APIs from other venders would be used retrieve status and to enable simple operations, but in the other hand, this seems to be an overhead, as such tools already provide all relevant information, but in a different system. Here it would already be fine to have a browser tab in eclipse to access the cluster. Even Hue could be embedded. (B) for web developers it would be fine to have a "HUE Module" available as a template to start coding, testing and deployment. We could see, that application development around Hadoop is not "the one task, done in one IDE", but a set of multiple activities which include even administration and data or metadata management. An IDE is often seen as the "environment to do the coding in a productive way - not deployment, and this can confuse Hadoop newbies. Maybe this are reasons for the low activity, because the focus is not clear and the tasks are that diverse. I think, instead of retirement of HDT we should actively create "The case for HDT". One way to do this could be a collection of best practices and tutorials which show how HDT helps or even can help - from here we can go on with the tool development affords and hopefully with some work which integrates the Kite SDK into HDT. The dataset tools is already a good starting point. Based on this, a dataset inspector which even produces dataset profiles seems to be a doable project for a student. I volunteer for mentoring and providing an existing skeleton of the code for this module. To include more ideas from Kite SDK developers and other people I know, who may be interested in this discussion I send it to some "of list addresses" to invite those people. Good luck HDT !!! Cheers, Mirko 2014-11-10 9:45 GMT+01:00 Rahul Sharma <[email protected]>: > Hi all, > > Based on the discussion happened on the mailing list [1] ,I'd like to call > a VOTE to retire[2] Apache HDT from Apache Incubator. It appears i that > the project has lost community interest with almost no activity on mailing > lists. > > This VOTE will be open for at least 72 hours and passes on achieving a > consensus. > > +1 [ ] Yes, I am in favor of retiring HDT from the Apache Incubator. > +0 [ ] > -1 [ ] No, I am not in favor of retiring HDT because... > > regards > Rahul > > [1] http://apache.markmail.org/message/ljcrnj5uluiemvaz > [2] http://incubator.apache.org/guides/retirement.html >
