I'm in favor of merging them as well. Keeping the git repositories separate doesn't enforce any kind of architectural separation, it just makes build + test more complex. Nearly every major change is using the topic field hack by this point. I think the only downside is that the tests will take longer, but that may need to be revisited anyway (in Hyracks, the index stress tests- especially for inverted indexes- take far too long).
Another .02¢ :) - Ian On Mon, Jun 1, 2015 at 9:46 PM, Yingyi Bu <[email protected]> wrote: > Chris, > > Thanks for the input!! > > >>1. If we're serious about Hyracks being a re-usable component of other > products, it makes sense to dogfood that in Asterixdb. If there are > problems ?>>keeping Hyracks separate from Asterix or keeping Hyracks with > clean interfaces, this forces us to address them. > > In my opinion, merging the repository doesn't break the separation of > hyracks and asterixdb, because the dependencies are controlled by mvn pom > files. We just make the code physically live together under the root > directory, one is hyracks as it is and the other is asterixdb as it is. > For example, Spark lives together with all the things on top of it and that > doesn't seem to prevent its reusability. Hadoop lives together with > Hive/Pig/Zookeeper in the same repo until year 2010 when it is very stable. > > Currently almost all my changes are spanning hyracks and asterixdb. I > believe many people also suffer from that. Merging them together will have > the following benefits: > 1) It forces those hyracks-only changes to pass asterixdb regression > tests. Currently hyracks-only change are not verified by asterixdb tests. > 2) On my local machine, I don't need to always install hyracks and then > verify asterixdb from time to time. Especially, switching branches seems > painful because the installed hyracks snapshot is overwritten from time to > time. > 3) I only need to make one code review request and one jenkins job. > Currently I need to manually change the topic of my asterixdb gerrit CL > every time before I update my hyracks CL, and then manually schedule > jenkins to run a new asterixdb job. If I forget to schedule the jenkins > job, the asterixdb CL is still shown to be "verified by jenkins". > > >>2. We only just recently took the initiative to take Pregelix and > Hiversterix *out* of the same repository, and that was because they were > specifically >>causing us problems as components of the same build. (There > were issues of competing dependency versions with Ian's YARN work, as well > as >>several spurious pregelix test failures, as I recall.) At a bare > minimum, we cannot merge those projects back in without re-researching and > addressing >>those problems. > > Those will be definitely be fixed before Pregelix and IMRU are merged > back. Hivesterix is dead and will not be merged. I'm not proposing that we > should bring Pregelix and IMRU in now but to do that later when they are > ready. > > Best, > Yingyi > > > > > On Mon, Jun 1, 2015 at 5:15 PM, Chris Hillery <[email protected]> wrote: > > > My $.02 - no, we shouldn't. > > > > Two main reasons: > > > > 1. If we're serious about Hyracks being a re-usable component of other > > products, it makes sense to dogfood that in Asterixdb. If there are > > problems keeping Hyracks separate from Asterix or keeping Hyracks with > > clean interfaces, this forces us to address them. > > > > 2. We only just recently took the initiative to take Pregelix and > > Hiversterix *out* of the same repository, and that was because they were > > specifically causing us problems as components of the same build. (There > > were issues of competing dependency versions with Ian's YARN work, as > well > > as several spurious pregelix test failures, as I recall.) At a bare > > minimum, we cannot merge those projects back in without re-researching > and > > addressing those problems. > > > > What benefits would we gain by merging them? I honestly don't agree with > > Yingyi's suggestion that it would make building, bug-fixing, and code > > review much simpler. At best it would help a bit on those occasions when > a > > change spans Hyracks and Asterix, and again, IMHO that is something that > > *should* require additional thought and oversight. As for build and test, > > my feeling is that it will make it considerably harder, or at the very > > least slower, simply due to doubling the Maven overhead. > > > > I do not feel that merging the projects to either fit in better with > > Apache, or to game the Apache popularity indexes, is a good trade-off. > > > > Ceej > > aka Chris Hillery > > > > On Mon, Jun 1, 2015 at 12:02 PM, Yingyi Bu <[email protected]> wrote: > > > >> Hi folks, > >> > >> Should we merge hyracks, asterixdb, and potentially pregelix/imru > >> into the same repository? It will make build, fix, and code review > >> process much simpler. > >> An example is that everything built on top of Spark lives in the > same > >> repository: https://github.com/apache/spark. That's also why Spark > is > >> the most active Apache project now, due to its commit frequency. > >> Does anyone have concerns for merging the hyracks and asterixdb > >> repositories? > >> Thanks! > >> > >> Best, > >> Yingyi > >> > >> > >> On Wed, Apr 22, 2015 at 10:13 PM, Till Westmann <[email protected]> > wrote: > >> > >>> Ok, let’s find out what is the “more work” part before we decide :) > >>> > >>> We should already have the SGA (as it’s part of the SGA that Mike sent > >>> in) and it seemed to me that all we’re need to do “later” (e.g. next > >>> week/month) would be to > >>> a) vote on bringing it into AsterixDB (that would be an incubator vote > I > >>> assume) and > >>> b) asking infra for another git repository. > >>> So the extra work would be the vote on the incubator list. > >>> Is that right or is there something else we’d need to do? > >>> > >>> Cheers, > >>> Till > >>> > >>> On Apr 22, 2015, at 10:04 PM, Mattmann, Chris A (3980) < > >>> [email protected]> wrote: > >>> > >>> Hey Mike and team, > >>> > >>> Thanks for bringing this to the list. I think these are precisely > >>> the type of conversations that we want to have here at the ASF and > >>> as part of our Incubating project. Having these discussions in the > >>> community here at the ASF (which is now the Apache AsterixDB community) > >>> is great. > >>> > >>> My opinion - it’s fine either way. I’m happy if you guys want to > >>> bring Pregelix into the code base here via AsterixDB. It’s easily > >>> reversible and incremental. If you want to spin out Pregelix later > >>> as its own TLP and it’s shown to have its own community we can > >>> file a board resolution to do that. Heck, nothing stops us from > >>> graduating 2 Incubator projects=>TLPs out of this effort even in > >>> the Incubator. That’s fine. If you want to wait and bring it in > >>> later, it will definitely be more work - so let’s call a spade a > >>> spade there. But if you want to do that that’s fine too. > >>> > >>> My personal recommendation - bring it in - won’t hurt and we can > >>> always pivot in the ways above later. > >>> > >>> Cheers, > >>> Chris > >>> > >>> > >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>> Chris Mattmann, Ph.D. > >>> Chief Architect > >>> Instrument Software and Science Data Systems Section (398) > >>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > >>> Office: 168-519, Mailstop: 168-527 > >>> Email: [email protected] > >>> WWW: http://sunset.usc.edu/~mattmann/ > >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>> Adjunct Associate Professor, Computer Science Department > >>> University of Southern California, Los Angeles, CA 90089 USA > >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>> > >>> > >>> > >>> > >>> > >>> > >>> -----Original Message----- > >>> From: Michael Carey <[email protected]> > >>> Date: Tuesday, April 21, 2015 at 11:49 AM > >>> To: Chris Mattmann <[email protected]>, Till Westmann > >>> <[email protected]> > >>> Cc: Chris Hillery <[email protected]>, Ian Maxon <[email protected]>, > >>> Yingyi > >>> Bu <[email protected]>, "[email protected]" > >>> <[email protected]> > >>> Subject: Re: Migration of git repository > >>> > >>> Sure! Let me clarify the issue for everyone (and broaden the > question). > >>> > >>> One of the technical by-products of the AsterixDB project is a graph > >>> analytics package called Pregelix - as the name suggests, it is a > "knock > >>> off" of Pregel, as are packages like Giraph. What's unique about > >>> Pregelix is that it actually scales without OOM'ing > >>> - under the covers it uses database join processing techniques. You > can > >>> find out more about it by visiting > >>> http://pregelix.ics.uci.edu/ and/or by skimming the attached paper - > >>> check out the experimental results compared to other popular > >>> alternatives. Anyway, we have made it freely available (as we do all > of > >>> our AsterixDB-related > >>> research products) and we were thinking that we should simply include > it > >>> under the AsterixDB project - kind of like Spark has subprojects for > SQL, > >>> streams, graphs, etc. As a result, I listed it on the list of > >>> transferred artifacts when I sent in the licensing > >>> form the other day. (So we at least have that step done.) Its code > >>> conntributors have been a small subset of the AsterixDB team; it was a > >>> small sub-project, basically. (Mostly just Yingyi Bu!) > >>> > >>> Pregelix is kind of a sibling of Apache VXQuery in that its runtime is > >>> based on Hyracks but it hasn't otherwise been AsterixDB-dependent. > >>> However, we have just finished teaching it to read/write directly from > >>> AsterixDB native storage - instead of just HDFS > >>> - so now it has an AsterixDB dependency, and we are using it as a > >>> driving example of how to couple AsterixDB to other analytic engines. > >>> > >>> Rather than going through another exercise to open-source this > >>> separately, it seemed like we could take this approach. > >>> > >>> Thoughts? > >>> Cheers, > >>> Mike > >>> > >>> > >>> On 4/21/15 7:45 AM, Mattmann, Chris A (3980) wrote: > >>> > >>> > >>> Yes, in fact, this whole conversations should be happening on > >>> the dev list. OK for me to CC them on my reply? > >>> > >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>> Chris Mattmann, Ph.D. > >>> Chief Architect > >>> Instrument Software and Science Data Systems Section (398) > >>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > >>> Office: 168-519, Mailstop: 168-527 > >>> Email: [email protected] > >>> WWW: http://sunset.usc.edu/~mattmann/ > >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>> Adjunct Associate Professor, Computer Science Department > >>> University of Southern California, Los Angeles, CA 90089 USA > >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>> > >>> > >>> > >>> > >>> > >>> > >>> -----Original Message----- > >>> From: "Michael J. Carey" <[email protected]> > >>> <mailto:[email protected] <[email protected]>> > >>> Date: Tuesday, April 21, 2015 at 3:13 AM > >>> To: Till Westmann <[email protected]> <mailto:[email protected] > >>> <[email protected]>> > >>> Cc: Chris Hillery <[email protected]> <mailto:[email protected] > >>> <[email protected]>>, Ian > >>> Maxon <[email protected]> <mailto:[email protected] <[email protected]>>, > Yingyi > >>> Bu <[email protected]> <mailto:[email protected] <[email protected] > >>, > >>> Chris Mattmann > >>> <[email protected]> <mailto:[email protected] > >>> <[email protected]>> > >>> Subject: Re: Migration of git repository > >>> > >>> + Yingyi on the Pregelix Q. Should we also ask Chris M for advice on > >>> that? > >>> On Apr 20, 2015 4:23 PM, "Till Westmann" <[email protected]> > >>> <mailto:[email protected] <[email protected]>> wrote: > >>> > >>> Hi Ian, > >>> > >>> > >>> That’s a good question - and I don’t know the answer. > >>> We’ve got 2 repos so far: > >>> > >>> > https://issues.apache.org/jira/browse/INFRA-9212https://issues.apache.org/ > >>> jira/browse/INFRA-9306 > >>> so we should have space for Hyracks and AsterixDB. > >>> > >>> > >>> I think that there’s an open questions about Pregelix, but maybe that > >>> shouldn’t keep us from going ahead. > >>> > >>> > >>> I further think that it would be great if you could send an e-mail to > >>> [email protected]< > >>> mailto:[email protected] > >>> <[email protected]> > >>> rg> <mailto:[email protected] > >>> <[email protected]>> and ask if it’s ok to > >>> import > >>> our git repo(s) or if something else needs to be done first. (I could > >>> send that e-mail as well, but it would be great if there were more > >>> non-Till e0mails on the list :) ) > >>> > >>> > >>> Cheers, > >>> Till > >>> > >>> > >>> On Apr 20, 2015, at 4:07 PM, Ian Maxon <[email protected]> > >>> <mailto:[email protected] <[email protected]>> wrote: > >>> > >>> Hi Mike, Chris and Till, > >>> > >>> > >>> Since (I think?) the paperwork for the software grant is done now, > should > >>> I copy our GC branches over to the ASF git repositories now ( as well > as > >>> making it a mirror in the Gerrit commit hook script)? > >>> > >>> > >>> Thanks, > >>> - Ian > >>> > >>> > >>> > >> > > >
