Re: Migration of git repository

Mike Carey Thu, 04 Jun 2015 15:40:11 -0700

At some point we really need to move to the once-discussed layeredtesting approach that we had "back in my youth" when I was working onDB2 at IBM. There was a tier of tests that had to be run before/duringany check-in, a tier that ran nightly, and a tier that ran weekly orsomething like that. The first tier was the "immune system" to avoidbasic accidental Bad Things that one component might do to another (animmune system). The next tier was a more substantial check of eachcomponent (taking too long, as a group of tests, to be in alldevelopers' paths during checkins). The last tier was "everything".


On 6/2/15 9:33 AM, Ian Maxon wrote:

Hi Taewoo,
It's really anything
in hyracks-tests/hyracks-storage-am-lsm-invertedindex-test (besides the
tokenizer test).  All of the tests in that package alone take over 20
minutes. Each one takes about 2 minutes.

Thanks,
- Ian

On Tue, Jun 2, 2015 at 9:13 AM, Taewoo Kim <[email protected]> wrote:

Hi Ian,

Could you specify the exact class name of the index stress test? I would
like to look at it. Thanks.

Best,
Taewoo

On Tue, Jun 2, 2015 at 9:05 AM, Ian Maxon <[email protected]> wrote:

I'm in favor of merging them as well. Keeping the git repositories

separate

doesn't enforce any kind of architectural separation, it just makes

build +

test more complex. Nearly every major change is using the topic field

hack

by this point.
I think the only downside is that the tests will take longer, but that

may

need to be revisited anyway (in Hyracks, the index stress tests-

especially

for inverted indexes- take far too long).

Another .02¢ :)

- Ian

On Mon, Jun 1, 2015 at 9:46 PM, Yingyi Bu <[email protected]> wrote:

Chris,

Thanks for the input!!

1. If we're serious about Hyracks being a re-usable component of

other

products, it makes sense to dogfood that in Asterixdb. If there are
problems ?>>keeping Hyracks separate from Asterix or keeping Hyracks

with

clean interfaces, this forces us to address them.

In my opinion,  merging the repository doesn't break the separation of
hyracks and asterixdb, because the dependencies are controlled by mvn

pom

files. We just make the code physically live together under the root
directory, one is hyracks as it is and the other is asterixdb as it is.
For example, Spark lives together with all the things on top of it and

that

doesn't seem to prevent its reusability. Hadoop lives together with
Hive/Pig/Zookeeper in the same repo until year 2010 when it is very

stable.

Currently almost all my changes are spanning hyracks and asterixdb.  I
believe many people also suffer from that.  Merging them together will

have

the following benefits:
1) It forces those hyracks-only changes to pass asterixdb regression
tests.  Currently hyracks-only change are not verified by asterixdb

tests.

2) On my local machine,  I don't need to always install hyracks and

then

verify asterixdb from time to time.  Especially, switching branches

seems

painful because the installed hyracks snapshot is overwritten from time

to

time.
3) I only need to make one code review request and one jenkins job.
Currently I need to manually change the topic of my asterixdb gerrit CL
every time before I update my hyracks CL, and then manually schedule
jenkins to run a new asterixdb job.  If I forget to schedule the

jenkins

job, the asterixdb CL is still shown to be "verified by jenkins".

2. We only just recently took the initiative to take Pregelix and

Hiversterix *out* of the same repository, and that was because they

were

specifically >>causing us problems as components of the same build.

(There

were issues of competing dependency versions with Ian's YARN work, as

well

as >>several spurious pregelix test failures, as I recall.) At a bare
minimum, we cannot merge those projects back in without re-researching

and

addressing >>those problems.

Those will be definitely be fixed before Pregelix and IMRU are merged
back.  Hivesterix is dead and will not be merged. I'm not proposing

that

we

should bring Pregelix and IMRU in now but to do that later when they

are

ready.

Best,
Yingyi




On Mon, Jun 1, 2015 at 5:15 PM, Chris Hillery <[email protected]>

wrote:

My $.02 - no, we shouldn't.

Two main reasons:

1. If we're serious about Hyracks being a re-usable component of

other

products, it makes sense to dogfood that in Asterixdb. If there are
problems keeping Hyracks separate from Asterix or keeping Hyracks

with

clean interfaces, this forces us to address them.

2. We only just recently took the initiative to take Pregelix and
Hiversterix *out* of the same repository, and that was because they

were

specifically causing us problems as components of the same build.

(There

were issues of competing dependency versions with Ian's YARN work, as

well

as several spurious pregelix test failures, as I recall.) At a bare
minimum, we cannot merge those projects back in without

re-researching

and

addressing those problems.

What benefits would we gain by merging them? I honestly don't agree

with

Yingyi's suggestion that it would make building, bug-fixing, and code
review much simpler. At best it would help a bit on those occasions

when

change spans Hyracks and Asterix, and again, IMHO that is something

that

*should* require additional thought and oversight. As for build and

test,

my feeling is that it will make it considerably harder, or at the

very

least slower, simply due to doubling the Maven overhead.

I do not feel that merging the projects to either fit in better with
Apache, or to game the Apache popularity indexes, is a good

trade-off.

Ceej
aka Chris Hillery

On Mon, Jun 1, 2015 at 12:02 PM, Yingyi Bu <[email protected]>

wrote:

Hi folks,

     Should we merge hyracks, asterixdb, and potentially

pregelix/imru

into the same repository?   It will make build, fix, and code review
process much simpler.
     An example is that everything built on top of Spark lives in the

same

repository:  https://github.com/apache/spark.   That's also why

Spark

is

the most active Apache project now, due to its commit frequency.
     Does anyone have concerns for merging the hyracks and asterixdb
repositories?
     Thanks!

Best,
Yingyi


On Wed, Apr 22, 2015 at 10:13 PM, Till Westmann <[email protected]>

wrote:

Ok, let’s find out what is the “more work” part before we decide :)

We should already have the SGA (as it’s part of the SGA that Mike

sent

in) and it seemed to me that all we’re need to do “later” (e.g.

next

week/month) would be to
a) vote on bringing it into AsterixDB (that would be an incubator

vote

assume) and
b) asking infra for another git repository.
So the extra work would be the vote on the incubator list.
Is that right or is there something else we’d need to do?

Cheers,
Till

On Apr 22, 2015, at 10:04 PM, Mattmann, Chris A (3980) <
[email protected]> wrote:

Hey Mike and team,

Thanks for bringing this to the list. I think these are precisely
the type of conversations that we want to have here at the ASF and
as part of our Incubating project. Having these discussions in the
community here at the ASF (which is now the Apache AsterixDB

community)

is great.

My opinion - it’s fine either way. I’m happy if you guys want to
bring Pregelix into the code base here via AsterixDB. It’s easily
reversible and incremental. If you want to spin out Pregelix later
as its own TLP and it’s shown to have its own community we can
file a board resolution to do that. Heck, nothing stops us from
graduating 2 Incubator projects=>TLPs out of this effort even in
the Incubator. That’s fine. If you want to wait and bring it in
later, it will definitely be more work - so let’s call a spade a
spade there. But if you want to do that that’s fine too.

My personal recommendation - bring it in - won’t hurt and we can
always pivot in the ways above later.

Cheers,
Chris


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: [email protected]
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: Michael Carey <[email protected]>
Date: Tuesday, April 21, 2015 at 11:49 AM
To: Chris Mattmann <[email protected]>, Till Westmann
<[email protected]>
Cc: Chris Hillery <[email protected]>, Ian Maxon <[email protected]

Yingyi
Bu <[email protected]>, "[email protected]"
<[email protected]>
Subject: Re: Migration of git repository

Sure!  Let me clarify the issue for everyone (and broaden the

question).

One of the technical by-products of the AsterixDB project is a

graph

analytics package called Pregelix - as the name suggests, it is a

"knock

off" of Pregel, as are packages like Giraph.  What's unique about
Pregelix is that it actually scales without OOM'ing
- under the covers it uses database join processing techniques.

You

can

find out more about it by visiting
http://pregelix.ics.uci.edu/ and/or by skimming the attached

paper -

check out the experimental results compared to other popular
alternatives.  Anyway, we have made it freely available (as we do

all

of

our AsterixDB-related
research products) and we were thinking that we should simply

include

it

under the AsterixDB project - kind of like Spark has subprojects

for

SQL,

streams, graphs, etc.  As a result, I listed it on the list of
transferred artifacts when I sent in the licensing
form the other day.  (So we at least have that step done.)  Its

code

conntributors have been a small subset of the AsterixDB team; it

was

small sub-project, basically.  (Mostly just Yingyi Bu!)

Pregelix is kind of a sibling of Apache VXQuery in that its runtime

is

based on Hyracks but it hasn't otherwise been AsterixDB-dependent.
However, we have just finished teaching it to read/write directly

from

AsterixDB native storage - instead of just HDFS
- so now it has an AsterixDB dependency, and we are using it as a
driving example of how to couple AsterixDB to other analytic

engines.

Rather than going through another exercise to open-source this
separately, it seemed like we could take this approach.

Thoughts?
Cheers,
Mike


On 4/21/15 7:45 AM, Mattmann, Chris A (3980) wrote:


Yes, in fact, this whole conversations should be happening on
the dev list. OK for me to CC them on my reply?

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: [email protected]
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: "Michael J. Carey" <[email protected]>
<mailto:[email protected] <[email protected]>>
Date: Tuesday, April 21, 2015 at 3:13 AM
To: Till Westmann <[email protected]> <mailto:[email protected]
<[email protected]>>
Cc: Chris Hillery <[email protected]> <mailto:[email protected]
<[email protected]>>, Ian
Maxon <[email protected]> <mailto:[email protected] <[email protected]>>,

Yingyi

Bu <[email protected]> <mailto:[email protected] <

[email protected]

Chris Mattmann
<[email protected]> <mailto:

[email protected]

<[email protected]>>
Subject: Re: Migration of git repository

+ Yingyi on the Pregelix Q.  Should we also ask Chris M for advice

on

that?
On Apr 20, 2015 4:23 PM, "Till Westmann" <[email protected]>
<mailto:[email protected] <[email protected]>> wrote:

Hi Ian,


That’s a good question - and I don’t know the answer.
We’ve got 2 repos so far:

https://issues.apache.org/jira/browse/INFRA-9212https://issues.apache.org/

jira/browse/INFRA-9306
so we should have space for Hyracks and AsterixDB.


I think that there’s an open questions about Pregelix, but maybe

that

shouldn’t keep us from going ahead.


I further think that it would be great if you could send an e-mail

to

[email protected]<
mailto:[email protected]
<[email protected]>
rg> <mailto:[email protected]
<[email protected]>> and ask if it’s ok to
import
our git repo(s) or if something else needs to be done first. (I

could

send that e-mail as well, but it would be great if there were more
non-Till e0mails on the list :) )


Cheers,
Till


On Apr 20, 2015, at 4:07 PM, Ian Maxon <[email protected]>
<mailto:[email protected] <[email protected]>> wrote:

Hi Mike, Chris and Till,


Since (I think?) the paperwork for the software grant is done now,

should

I copy our GC branches over to the ASF git repositories now ( as

well

as

making it a mirror in the Gerrit commit hook script)?


Thanks,
- Ian

Re: Migration of git repository

Reply via email to