So is there some way to fix change-topic and other user experience issues that separation "causes"? I.e, could we have our cake (separated code bases for multiple Hyracks consumers) and eat it too in AsterixDB (not feeling added pain, but having a fairly seemless experience if you do both-level stuff)?

On 6/2/15 11:59 PM, Till Westmann wrote:
On Jun 2, 2015, at 22:45, Yingyi Bu <[email protected]> wrote:

I haven't tried working on multiple Hyracks branches at the same time, so I haven't experienced this. This 
seems like a working method error, though. If >>you're working with two things that are "the 
same version" (even if that's a snapshot version), you'll need to use separate Maven repositories to 
install >>them. In fact, merging the two git repositories would do nothing to fix this problem, will 
it? If the proposal is to put the two source repositories in the >>same git repo but otherwise leave 
them untouched, then nothing would change in the build process. It's possible I'm missing something there, 
though.
Is there a way to use multiple mvn repositories on the same machine?   I used 
to think mvn always installs artifacts to the directory ~/.m2/repository.
I guess we just need to have a root-level pom and leave hyracks and asterixdb untouched.  
Then, a single root-level "mvn package ..." will build everything without 
requiring installing hyracks first.  It's just like what we currently do for hyracks and 
algebricks.  Then, builds/tests do not leave side-effects in ~/.m2/repository.
Great question! I just looked into this a bit (but I didn't try it) and the 
docs seem to suggest that
a) you should be able to specify the local repository in a settings.xml and that
b) you should be able to specify the settings.xml on the maven command line.
So it should be possible to do that - and with some shell magic I think that it 
should even be possible to do that in a largely invisible way.

As for manually scheduling Asterix Jenkins jobs, that sounds like it's only a problem where 
your Hyracks change breaks an existing public API. That >>would be obviated by having 
true API testing inside of Hyracks, which is something that we should have regardless of 
any decisions about source >>locations.
I agree that's the right software engineering way. Going forward, we do need to 
add more unit tests in hyracks and asterixdb. But considering the resource 
constraints, I'm not sure whether (or when) we can have a complete API test 
suite for hyracks/algebricks:
1)  both hyracks and algebricks public APIs allow an arbitrary input DAG (a 
logical plan or a hyracks job).  It's hard to enumerate all possibilities in 
hyracks/algebricks tests.  My experience is that when we see a broken AQL 
query,  we fix it in both hyracks/asterixdb codebases,  and verify it with the 
AQL query. In those cases,  there might be no need to have yet-another verbose 
hyracks/algebricks test.
2)  even if we have a comprehensive test suite for hyracks,  I'm not sure 
whether it can guarantee to pass asterixdb tests because the current asterixdb 
test suite covers a lot of edge cases in the hyracks runtime, LSM, and 
algebricks.
One way to use existing clients as tests for Hyracks could be to set up a 
system that runs the tests of the existing versions of the clients against a 
new version of Hyracks - ideally all client isolated from each other and in 
parallel to keep turn around times low.
Does that sound feasible?

Cheers,
Till

Anyway, if the repositories have to be separated, it would be nice that the 
"change-topic" issue can be fixed.

Best,
Yingyi


On Tue, Jun 2, 2015 at 10:00 AM, Chris Hillery <[email protected]> wrote:
On Mon, Jun 1, 2015 at 9:46 PM, Yingyi Bu <[email protected]> wrote:
In my opinion,  merging the repository doesn't break the separation of hyracks 
and asterixdb, because the dependencies are controlled by mvn pom files.
That wasn't the separation I was talking about. I meant API separation. As it 
is now, when we make a change to both Asterix and Hyracks, we are forced to 
consider the API implications, or at least they are put out there in a very 
clear way that we need to look at. If we merge them, people will (rightly) 
treat the whole thing as one product, and there will be no brakes on making 
wide-ranging API changes.

(As an aside: I don't trust Maven's pom files to do a good job of keeping the 
dependency management clean. In fact I trust it to do precisely the opposite, 
by making it both easier to screw up the dependencies and harder to update them 
in future.)

Again, my point is this: If we truly believe that Hyracks is a re-usable component, it 
should be treated as such from source to build to delivery. By merging in Asterix, we are 
saying that Asterix is "more equal" than others Hyracks clients, to the point 
that we're tacitly willing to break those other clients in favor of simplifying Asterix 
development. If that is a fair and true statement, well, then, sure, let's merge them.

1) It forces those hyracks-only changes to pass asterixdb regression tests.  
Currently hyracks-only change are not verified by asterixdb tests.
This is a good point, I will admit. However, I think this same goal can be met 
in other ways. My strong preference would be to create a set of true API tests 
inside of Hyracks, which both document and test the external Hyracks API. That 
will make API-breaking changes in future much easier to spot, and also make it 
clear when Asterix is using internal APIs that it should not.
2) On my local machine,  I don't need to always install hyracks and then verify 
asterixdb from time to time.  Especially, switching branches seems painful 
because the installed hyracks snapshot is overwritten from time to time.
I haven't tried working on multiple Hyracks branches at the same time, so I haven't 
experienced this. This seems like a working method error, though. If you're working with 
two things that are "the same version" (even if that's a snapshot version), 
you'll need to use separate Maven repositories to install them. In fact, merging the two 
git repositories would do nothing to fix this problem, will it? If the proposal is to put 
the two source repositories in the same git repo but otherwise leave them untouched, then 
nothing would change in the build process. It's possible I'm missing something there, 
though.
3) I only need to make one code review request and one jenkins job.  Currently I need to 
manually change the topic of my asterixdb gerrit CL every time before I update my hyracks 
CL, and then manually schedule jenkins to run a new asterixdb job.  If I forget to 
schedule the jenkins job, the asterixdb CL is still shown to be "verified by 
jenkins".
This is a problem, but it's a problem in commit validation, not in the source. 
Modifying the source to work around these issues is still a bad idea IMHO.

The "change-topic" issue could be fixed with a bit of development work (have 
the topic point to a change, rather than a specific patchset on the change, so you only 
need to set it once, for instance).

As for manually scheduling Asterix Jenkins jobs, that sounds like it's only a 
problem where your Hyracks change breaks an existing public API. That would be 
obviated by having true API testing inside of Hyracks, which is something that 
we should have regardless of any decisions about source locations.

In summary / repeating myself again: yes, we have some problems because Hyracks 
and Asterix are in seperate repositories. But those problems are pointing out 
true issues with our development and processes. Merging the repositories isn't 
fixing those problems, it's sweeping them under the rug. Long term we would be 
much better off to identify, isolate, and fix the problems themselves.

Ceej
aka Chris Hillery



Reply via email to