Continuing the consensus on this thread, I'm now working on moving the spark connector out of core. HBASE-21430 clones our current modules (after some pom cleanup and dependency messing). I'm now working on subtask to purge them from our main repo. Will put more info on our old hbase-spark integration DISCUSS thread from a while back.
S On Fri, Nov 2, 2018 at 11:43 AM Stack <[email protected]> wrote: > Related, interesting thread on how various projects are doing the > github<->asf connection has just started[1]. > > A few have a notifications list that gets the github notification emails. > Email clients make parseable threads of the firehose apparently. We might > do same. > > S > 1. > http://mail-archives.apache.org/mod_mbox/community-dev/201811.mbox/browser > > On Fri, Nov 2, 2018 at 10:37 AM Josh Elser <[email protected]> wrote: > >> >> >> On 11/2/18 11:20 AM, Stack wrote: >> > On Fri, Nov 2, 2018 at 7:30 AM Josh Elser <[email protected]> wrote: >> > >> >> Nice stuff, Stack! >> >> >> >> >> > Not me. It Mike's work. >> > >> >> Sorry, didn't mean it that way. Was happy to see you pushing the >> hbase-connectors repo forward :) >> >> Obviously kudos to MikeW for the code itself! >> >> >> Two quick questions: >> >> >> >> First, on provenance: this codebase primarily came from Mike Wingert on >> >> https://issues.apache.org/jira/browse/HBASE-15320? Just saw that the >> >> commit came from your email addr -- wasn't sure if that Mike was still >> >> involved (or you took it to completion). >> >> >> >> >> > You talking about the merge done over on the hbase-connectors? >> > >> > Looks like I get blamed for the merge -- if I do a show on the merge >> > commit, it is nothing but the merge note "Merge pull request #3 from >> > hbasejanitor/master" where hbasejanitor is Mike's handle -- but then the >> > merge note is followed by Mikes' work, properly attributed to him. >> > >> > I did not pay close attention to this aspect of how git boxing does it. >> > Seems fine to me. What you think? >> >> Ahh, no, this was just me. I think I only looked at the first commit(s), >> and not far enough down the list. No concerns. >> >> >> Second, I assume this new Git repo had all of the normal email-hooks >> set >> >> up. Do you know where they are being sent (dev, commit, or issues)? I'm >> >> also assuming that this is a Gitbox repo -- are we OK with >> pull-requests >> >> to this repo (as well as operator-tools) but still create a Jira issue? >> >> >> >> >> > Yep, gitbox. It has whatever infra set it up as. >> > >> > Back and forth was dumped into HBASE-21002 (We changed this config on >> > hbase-operator-tools config to not do this? I should look). >> > >> > Regards pull requests, etc., email configs., etc., we are in >> experimental >> > mode around all this stuff trying to figure it out so >> > suggestions/help/exercising the possibilities are all welcome. >> > >> > Thanks, >> > S >> >> Grand. Thanks for the reminder. Had a random question in Slack the other >> day about contributions -- will keep the "pave your own road" mindset in >> the fore-front :) >> >> >> - Josh >> >> >> >> On 10/31/18 6:43 PM, Stack wrote: >> >>> To tie-off this thread, this nice feature was just pushed on >> >>> hbase-connector. See >> >>> https://github.com/apache/hbase-connectors/tree/master/kafka for >> how-to. >> >>> Review and commentary welcome. >> >>> >> >>> Thanks, >> >>> S >> >>> >> >>> On Fri, Aug 3, 2018 at 6:32 AM Hbase Janitor <[email protected]> >> >> wrote: >> >>> >> >>>> I opened hbase-21002 to start the scripts and assembly. >> >>>> >> >>>> Mike >> >>>> >> >>>> On Thu, Aug 2, 2018, 19:29 Stack <[email protected]> wrote: >> >>>> >> >>>>> Up in https://issues.apache.org/jira/browse/HBASE-20934 I created >> an >> >>>>> hbase-connectors repo. I put some form on it using the v19 patch >> from >> >>>>> HBASE-15320 "HBase connector for Kafka Connect". It builds and tests >> >>>>> pass. Here are some remaining TODOs: >> >>>>> >> >>>>> * Figure how to do start scripts: e.g. we need to start up the >> kafka >> >>>>> proxy. It wants some hbase jars, conf dir, and others on the >> CLASSPATH >> >>>>> (Depend on an HBASE_HOME and then source bin/hbase?) >> >>>>> * Can any of the connectors make-do with the shaded client? >> >>>>> * Make connectors standalone or have them share conf, bin, etc? >> >>>>> * Need to do an assembly. Not done. >> >>>>> * Move over REST and thrift next. Mapreduce after? >> >>>>> >> >>>>> The poms could do w/ a review. Hacked them over from >> hbase-thirdparty. >> >>>>> >> >>>>> File issues and apply patches up in JIRA if your up for any of the >> >> above. >> >>>>> >> >>>>> Thanks, >> >>>>> S >> >>>>> >> >>>>> On Wed, Jul 25, 2018 at 10:46 PM Stack <[email protected]> wrote: >> >>>>>> >> >>>>>> >> >>>>>> On Tue, Jul 24, 2018 at 10:01 PM Misty Linville <[email protected]> >> >>>>> wrote: >> >>>>>>> >> >>>>>>> I like the idea of a separate connectors repo/release vehicle, but >> >>>> I'm a >> >>>>>>> little concerned about the need to release all together to update >> >> just >> >>>>> one >> >>>>>>> of the connectors. How would that work? What kind of compatibility >> >>>>>>> guarantees are we signing up for? >> >>>>>>> >> >>>>>> >> >>>>>> I hate responses that begin "Good question" -- so fawning -- but, >> >> ahem, >> >>>>> good question Misty (in the literal, not flattering, sense). >> >>>>>> >> >>>>>> I think hbase-connectors will be like hbase-thirdparty. The latter >> >>>>> includes netty, pb, guava and a few other bits and pieces so yeah, >> >>>>> sometimes a netty upgrade or an improvement on our patch to pb will >> >>>> require >> >>>>> us releasing all though we are fixing one lib only. Usually, if >> >> bothering >> >>>>> to make a release, we'll check for fixes or updates we can do in the >> >>>> other >> >>>>> bundled components. >> >>>>>> >> >>>>>> On the rate of releases, I foresee a flurry of activity around >> launch >> >>>> as >> >>>>> we fill missing bits and address critical bug fixes, but that then >> it >> >>>> will >> >>>>> settle down to be boring, with just the occasional update. Thrift >> and >> >>>> REST >> >>>>> have been stable for a good while now (not saying this is a good >> >> thing). >> >>>>> Our Sean just suggested moving mapreduce to connectors too -- an >> >>>>> interesting idea -- and this has also been stable too (at least >> until >> >>>>> recently with the shading work). We should talk about the Spark >> >> connector >> >>>>> when it comes time. It might not be as stable as the others. >> >>>>>> >> >>>>>> On the compatibility guarantees, we'll semver it so if an >> incompatible >> >>>>> change in a connector or if the connectors have to change to match a >> >> new >> >>>>> version of hbase, we'll make sure the hbase-connector version >> number is >> >>>>> changed appropriately. On the backend, what Mike says; connectors >> use >> >>>> HBase >> >>>>> Public APIs (else they can't be moved to the hbase-connector repo). >> >>>>>> >> >>>>>> S >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>>> >> >>>>>>> On Tue, Jul 24, 2018, 9:41 PM Stack <[email protected]> wrote: >> >>>>>>> >> >>>>>>>> Grand. I filed https://issues.apache.org/jira/browse/HBASE-20934 >> . >> >>>>> Let me >> >>>>>>>> have a go at making the easy one work first (the kafka proxy). >> Lets >> >>>>> see how >> >>>>>>>> it goes. I'll report back here. >> >>>>>>>> S >> >>>>>>>> >> >>>>>>>> On Tue, Jul 24, 2018 at 2:43 PM Sean Busbey <[email protected]> >> >>>>> wrote: >> >>>>>>>> >> >>>>>>>>> Key functionality for the project's adoption should be in the >> >>>>> project. >> >>>>>>>>> Please do not suggest we donate things to Bahir. >> >>>>>>>>> >> >>>>>>>>> I apologize if this is brisk. I have had previous negative >> >>>>> experiences >> >>>>>>>>> with folks that span our communities trying to move work I >> spent a >> >>>>> lot >> >>>>>>>>> of time contributing to within HBase over to Bahir in an attempt >> >>>> to >> >>>>>>>>> bypass an agreed upon standard of quality. >> >>>>>>>>> >> >>>>>>>>> On Tue, Jul 24, 2018 at 3:38 PM, Artem Ervits < >> >>>>> [email protected]> >> >>>>>>>>> wrote: >> >>>>>>>>>> Why not just donating the connector to >> http://bahir.apache.org/ >> >>>> ? >> >>>>>>>>>> >> >>>>>>>>>> On Tue, Jul 24, 2018, 12:51 PM Lars Francke < >> >>>>> [email protected]> >> >>>>>>>>> wrote: >> >>>>>>>>>> >> >>>>>>>>>>> I'd love to have the Kafka Connector included. >> >>>>>>>>>>> >> >>>>>>>>>>> @Mike thanks so much for the contribution (and your planned >> >>>> ones) >> >>>>>>>>>>> >> >>>>>>>>>>> I'm +1 on adding it to the core but I'm also +1 on having a >> >>>>> separate >> >>>>>>>>>>> repository under Apache governance >> >>>>>>>>>>> >> >>>>>>>>>>> On Tue, Jul 24, 2018 at 6:01 PM, Josh Elser < >> [email protected] >> >>>>> >> >>>>>>>> wrote: >> >>>>>>>>>>> >> >>>>>>>>>>>> +1 to the great point by Duo about use of non-IA.Public >> >>>> classes >> >>>>>>>>>>>> >> >>>>>>>>>>>> +1 for Apache for the governance (although, I wouldn't care >> >>>> if >> >>>>> we >> >>>>>>>> use >> >>>>>>>>>>>> Github PRs to try to encourage more folks to contribute), a >> >>>>> repo >> >>>>>>>> with >> >>>>>>>>> the >> >>>>>>>>>>>> theme of "connectors" (to include Thrift, REST, and the >> >>>> like). >> >>>>> Spark >> >>>>>>>>> too >> >>>>>>>>>>> -- >> >>>>>>>>>>>> I think we had suggested that prior, but it could be a mental >> >>>>>>>>> invention >> >>>>>>>>>>> of >> >>>>>>>>>>>> mine.. >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> On 7/24/18 10:16 AM, Hbase Janitor wrote: >> >>>>>>>>>>>> >> >>>>>>>>>>>>> Hi everyone, >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> I'm the author of the patch. A separate repo for all the >> >>>>>>>> connectors >> >>>>>>>>> is >> >>>>>>>>>>> a >> >>>>>>>>>>>>> great idea! I can make whatever changes necessary to the >> >>>>> patch to >> >>>>>>>>> help. >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> I have several other integration type projects like this >> >>>>> planned. >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> Mike >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> On Tue, Jul 24, 2018, 00:03 Mike Drob <[email protected]> >> >>>>> wrote: >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> I would be ok with all of the connectors in a single repo. >> >>>>> Doing a >> >>>>>>>>> repo >> >>>>>>>>>>>>>> per >> >>>>>>>>>>>>>> connector seems like a large amount of overhead work. >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> On Mon, Jul 23, 2018, 9:12 PM Clay B. <[email protected]> >> >>>> wrote: >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> [Non-binding] >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> I am all for the Kafka Connect(er) as indeed it makes >> >>>> HBase >> >>>>> "more >> >>>>>>>>>>>>>>> relevant" and generates buzz to help me sell HBase >> >>>> adoption >> >>>>> in my >> >>>>>>>>>>>>>>> endeavors. >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> Also, I would like to see a connectors repo a lot as I >> >>>> would >> >>>>>>>>> expect it >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>> can >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> make the HBase source and releases more obvious in what is >> >>>>>>>>> changing. >> >>>>>>>>>>> Not >> >>>>>>>>>>>>>>> to distract from Kafka, but Spark has in the past been a >> >>>>> hang-up >> >>>>>>>>> and >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>> seems >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> a good fit in such a repo too; as such, I would prefer >> >>>>> Apache >> >>>>>>>> over >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>> GitHub. >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> -Clay >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> On Mon, 23 Jul 2018, Andrew Purtell wrote: >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> Would we make a new repo called hbase-connectors and move >> >>>>> REST, >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> thrift, >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> and this new patch there? >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> I like this idea. We are already releasing >> >>>>> hbase-thirdparty like >> >>>>>>>>>>> this. >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> On Mon, Jul 23, 2018 at 5:47 PM Stack <[email protected]> >> >>>>> wrote: >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> (Thanks for the good discussion) >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> Where we think 'outside of HBase' would be? >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> Github seems too 'remote' from project and from Apache? >> >>>>> Would >> >>>>>>>> we >> >>>>>>>>>>> make >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> a >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> new >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> repo called hbase-connectors and move REST, thrift, and >> >>>>> this new >> >>>>>>>>>>> patch >> >>>>>>>>>>>>>>>>> there? >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> Thanks, >> >>>>>>>>>>>>>>>>> S >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> On Mon, Jul 23, 2018 at 3:50 PM Josh Elser < >> >>>>> [email protected]> >> >>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> I'm -0 for including this into the main hbase tree. I >> >>>>> feel like >> >>>>>>>>>>> we've >> >>>>>>>>>>>>>>>>>> made a bit of progress in cleaning up our core, and >> >>>> this >> >>>>>>>>> strikes me >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> as >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> a >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> step in the wrong direction. >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> At the same time, the integration seems nice enough >> >>>> (for >> >>>>> the >> >>>>>>>>> same >> >>>>>>>>>>>>>>>>>> reasons Andrew points out). Is there a reason this >> >>>>> couldn't >> >>>>>>>>> exist >> >>>>>>>>>>>>>>>>>> outside of HBase (at the ASF or otherwise)? Given a >> >>>> quick >> >>>>>>>>> glance at >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> the >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> patch, it would be quite trivial to keep separate (just >> >>>>> requires >> >>>>>>>>> some >> >>>>>>>>>>>>>>>>>> heavier scripting to get it off the ground that the >> >>>> HBase >> >>>>>>>>> scripts >> >>>>>>>>>>> do >> >>>>>>>>>>>>>>>>>> setup for). I feel like that will decrease our debt >> >>>>> while we >> >>>>>>>>> see if >> >>>>>>>>>>>>>>>>>> people start using it. Our API should be more than >> >>>> stable >> >>>>>>>>> enough to >> >>>>>>>>>>>>>>>>>> prevent any worry about drift happening from core to >> >>>> this >> >>>>>>>>> project. >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> On 7/23/18 6:35 PM, Stack wrote: >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> We have a very nice contrib sitting up in HBASE-15320 >> >>>>> which >> >>>>>>>>> via a >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> proxy >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> -- >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> so minimal dependencies -- adds source and sink for >> >>>>> Kafka >> >>>>>>>>> Connect. >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> It >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> is >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> nicely contained inside two new hbase-kafka-* modules. >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> We good w/ taking on this new feature? >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> It looks good to me. Check it out up on HBASE-15320. I >> >>>>> was >> >>>>>>>>> going >> >>>>>>>>>>> to >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> commit >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> to tip of branch-2 so it'd show up in hbase-2.2.x >> >>>>> unless you >> >>>>>>>>> all >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> want >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> some >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> backporting action going on. >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> S >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> -- >> >>>>>>>>>>>>>>>> Best regards, >> >>>>>>>>>>>>>>>> Andrew >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> Words like orphans lost among the crosstalk, meaning torn >> >>>>> from >> >>>>>>>>>>> truth's >> >>>>>>>>>>>>>>>> decrepit hands >> >>>>>>>>>>>>>>>> - A23, Crosstalk >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>> >> >>>>>>>> >> >>>>> >> >>>> >> >>> >> >> >> > >> >
