At some point we really need to move to the once-discussed layered
testing approach that we had "back in my youth" when I was working on
DB2 at IBM. There was a tier of tests that had to be run before/during
any check-in, a tier that ran nightly, and a tier that ran weekly or
something like that. The first tier was the "immune system" to avoid
basic accidental Bad Things that one component might do to another (an
immune system). The next tier was a more substantial check of each
component (taking too long, as a group of tests, to be in all
developers' paths during checkins). The last tier was "everything".
On 6/2/15 9:33 AM, Ian Maxon wrote:
Hi Taewoo,
It's really anything
in hyracks-tests/hyracks-storage-am-lsm-invertedindex-test (besides the
tokenizer test). All of the tests in that package alone take over 20
minutes. Each one takes about 2 minutes.
Thanks,
- Ian
On Tue, Jun 2, 2015 at 9:13 AM, Taewoo Kim <[email protected]> wrote:
Hi Ian,
Could you specify the exact class name of the index stress test? I would
like to look at it. Thanks.
Best,
Taewoo
On Tue, Jun 2, 2015 at 9:05 AM, Ian Maxon <[email protected]> wrote:
I'm in favor of merging them as well. Keeping the git repositories
separate
doesn't enforce any kind of architectural separation, it just makes
build +
test more complex. Nearly every major change is using the topic field
hack
by this point.
I think the only downside is that the tests will take longer, but that
may
need to be revisited anyway (in Hyracks, the index stress tests-
especially
for inverted indexes- take far too long).
Another .02¢ :)
- Ian
On Mon, Jun 1, 2015 at 9:46 PM, Yingyi Bu <[email protected]> wrote:
Chris,
Thanks for the input!!
1. If we're serious about Hyracks being a re-usable component of
other
products, it makes sense to dogfood that in Asterixdb. If there are
problems ?>>keeping Hyracks separate from Asterix or keeping Hyracks
with
clean interfaces, this forces us to address them.
In my opinion, merging the repository doesn't break the separation of
hyracks and asterixdb, because the dependencies are controlled by mvn
pom
files. We just make the code physically live together under the root
directory, one is hyracks as it is and the other is asterixdb as it is.
For example, Spark lives together with all the things on top of it and
that
doesn't seem to prevent its reusability. Hadoop lives together with
Hive/Pig/Zookeeper in the same repo until year 2010 when it is very
stable.
Currently almost all my changes are spanning hyracks and asterixdb. I
believe many people also suffer from that. Merging them together will
have
the following benefits:
1) It forces those hyracks-only changes to pass asterixdb regression
tests. Currently hyracks-only change are not verified by asterixdb
tests.
2) On my local machine, I don't need to always install hyracks and
then
verify asterixdb from time to time. Especially, switching branches
seems
painful because the installed hyracks snapshot is overwritten from time
to
time.
3) I only need to make one code review request and one jenkins job.
Currently I need to manually change the topic of my asterixdb gerrit CL
every time before I update my hyracks CL, and then manually schedule
jenkins to run a new asterixdb job. If I forget to schedule the
jenkins
job, the asterixdb CL is still shown to be "verified by jenkins".
2. We only just recently took the initiative to take Pregelix and
Hiversterix *out* of the same repository, and that was because they
were
specifically >>causing us problems as components of the same build.
(There
were issues of competing dependency versions with Ian's YARN work, as
well
as >>several spurious pregelix test failures, as I recall.) At a bare
minimum, we cannot merge those projects back in without re-researching
and
addressing >>those problems.
Those will be definitely be fixed before Pregelix and IMRU are merged
back. Hivesterix is dead and will not be merged. I'm not proposing
that
we
should bring Pregelix and IMRU in now but to do that later when they
are
ready.
Best,
Yingyi
On Mon, Jun 1, 2015 at 5:15 PM, Chris Hillery <[email protected]>
wrote:
My $.02 - no, we shouldn't.
Two main reasons:
1. If we're serious about Hyracks being a re-usable component of
other
products, it makes sense to dogfood that in Asterixdb. If there are
problems keeping Hyracks separate from Asterix or keeping Hyracks
with
clean interfaces, this forces us to address them.
2. We only just recently took the initiative to take Pregelix and
Hiversterix *out* of the same repository, and that was because they
were
specifically causing us problems as components of the same build.
(There
were issues of competing dependency versions with Ian's YARN work, as
well
as several spurious pregelix test failures, as I recall.) At a bare
minimum, we cannot merge those projects back in without
re-researching
and
addressing those problems.
What benefits would we gain by merging them? I honestly don't agree
with
Yingyi's suggestion that it would make building, bug-fixing, and code
review much simpler. At best it would help a bit on those occasions
when
a
change spans Hyracks and Asterix, and again, IMHO that is something
that
*should* require additional thought and oversight. As for build and
test,
my feeling is that it will make it considerably harder, or at the
very
least slower, simply due to doubling the Maven overhead.
I do not feel that merging the projects to either fit in better with
Apache, or to game the Apache popularity indexes, is a good
trade-off.
Ceej
aka Chris Hillery
On Mon, Jun 1, 2015 at 12:02 PM, Yingyi Bu <[email protected]>
wrote:
Hi folks,
Should we merge hyracks, asterixdb, and potentially
pregelix/imru
into the same repository? It will make build, fix, and code review
process much simpler.
An example is that everything built on top of Spark lives in the
same
repository: https://github.com/apache/spark. That's also why
Spark
is
the most active Apache project now, due to its commit frequency.
Does anyone have concerns for merging the hyracks and asterixdb
repositories?
Thanks!
Best,
Yingyi
On Wed, Apr 22, 2015 at 10:13 PM, Till Westmann <[email protected]>
wrote:
Ok, let’s find out what is the “more work” part before we decide :)
We should already have the SGA (as it’s part of the SGA that Mike
sent
in) and it seemed to me that all we’re need to do “later” (e.g.
next
week/month) would be to
a) vote on bringing it into AsterixDB (that would be an incubator
vote
I
assume) and
b) asking infra for another git repository.
So the extra work would be the vote on the incubator list.
Is that right or is there something else we’d need to do?
Cheers,
Till
On Apr 22, 2015, at 10:04 PM, Mattmann, Chris A (3980) <
[email protected]> wrote:
Hey Mike and team,
Thanks for bringing this to the list. I think these are precisely
the type of conversations that we want to have here at the ASF and
as part of our Incubating project. Having these discussions in the
community here at the ASF (which is now the Apache AsterixDB
community)
is great.
My opinion - it’s fine either way. I’m happy if you guys want to
bring Pregelix into the code base here via AsterixDB. It’s easily
reversible and incremental. If you want to spin out Pregelix later
as its own TLP and it’s shown to have its own community we can
file a board resolution to do that. Heck, nothing stops us from
graduating 2 Incubator projects=>TLPs out of this effort even in
the Incubator. That’s fine. If you want to wait and bring it in
later, it will definitely be more work - so let’s call a spade a
spade there. But if you want to do that that’s fine too.
My personal recommendation - bring it in - won’t hurt and we can
always pivot in the ways above later.
Cheers,
Chris
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: [email protected]
WWW: http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-----Original Message-----
From: Michael Carey <[email protected]>
Date: Tuesday, April 21, 2015 at 11:49 AM
To: Chris Mattmann <[email protected]>, Till Westmann
<[email protected]>
Cc: Chris Hillery <[email protected]>, Ian Maxon <[email protected]
,
Yingyi
Bu <[email protected]>, "[email protected]"
<[email protected]>
Subject: Re: Migration of git repository
Sure! Let me clarify the issue for everyone (and broaden the
question).
One of the technical by-products of the AsterixDB project is a
graph
analytics package called Pregelix - as the name suggests, it is a
"knock
off" of Pregel, as are packages like Giraph. What's unique about
Pregelix is that it actually scales without OOM'ing
- under the covers it uses database join processing techniques.
You
can
find out more about it by visiting
http://pregelix.ics.uci.edu/ and/or by skimming the attached
paper -
check out the experimental results compared to other popular
alternatives. Anyway, we have made it freely available (as we do
all
of
our AsterixDB-related
research products) and we were thinking that we should simply
include
it
under the AsterixDB project - kind of like Spark has subprojects
for
SQL,
streams, graphs, etc. As a result, I listed it on the list of
transferred artifacts when I sent in the licensing
form the other day. (So we at least have that step done.) Its
code
conntributors have been a small subset of the AsterixDB team; it
was
a
small sub-project, basically. (Mostly just Yingyi Bu!)
Pregelix is kind of a sibling of Apache VXQuery in that its runtime
is
based on Hyracks but it hasn't otherwise been AsterixDB-dependent.
However, we have just finished teaching it to read/write directly
from
AsterixDB native storage - instead of just HDFS
- so now it has an AsterixDB dependency, and we are using it as a
driving example of how to couple AsterixDB to other analytic
engines.
Rather than going through another exercise to open-source this
separately, it seemed like we could take this approach.
Thoughts?
Cheers,
Mike
On 4/21/15 7:45 AM, Mattmann, Chris A (3980) wrote:
Yes, in fact, this whole conversations should be happening on
the dev list. OK for me to CC them on my reply?
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: [email protected]
WWW: http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-----Original Message-----
From: "Michael J. Carey" <[email protected]>
<mailto:[email protected] <[email protected]>>
Date: Tuesday, April 21, 2015 at 3:13 AM
To: Till Westmann <[email protected]> <mailto:[email protected]
<[email protected]>>
Cc: Chris Hillery <[email protected]> <mailto:[email protected]
<[email protected]>>, Ian
Maxon <[email protected]> <mailto:[email protected] <[email protected]>>,
Yingyi
Bu <[email protected]> <mailto:[email protected] <
[email protected]
,
Chris Mattmann
<[email protected]> <mailto:
[email protected]
<[email protected]>>
Subject: Re: Migration of git repository
+ Yingyi on the Pregelix Q. Should we also ask Chris M for advice
on
that?
On Apr 20, 2015 4:23 PM, "Till Westmann" <[email protected]>
<mailto:[email protected] <[email protected]>> wrote:
Hi Ian,
That’s a good question - and I don’t know the answer.
We’ve got 2 repos so far:
https://issues.apache.org/jira/browse/INFRA-9212https://issues.apache.org/
jira/browse/INFRA-9306
so we should have space for Hyracks and AsterixDB.
I think that there’s an open questions about Pregelix, but maybe
that
shouldn’t keep us from going ahead.
I further think that it would be great if you could send an e-mail
to
[email protected]<
mailto:[email protected]
<[email protected]>
rg> <mailto:[email protected]
<[email protected]>> and ask if it’s ok to
import
our git repo(s) or if something else needs to be done first. (I
could
send that e-mail as well, but it would be great if there were more
non-Till e0mails on the list :) )
Cheers,
Till
On Apr 20, 2015, at 4:07 PM, Ian Maxon <[email protected]>
<mailto:[email protected] <[email protected]>> wrote:
Hi Mike, Chris and Till,
Since (I think?) the paperwork for the software grant is done now,
should
I copy our GC branches over to the ASF git repositories now ( as
well
as
making it a mirror in the Gerrit commit hook script)?
Thanks,
- Ian