David Knupp has posted comments on this change.

Change subject: IMPALA-3677: Write minidump on SIGUSR1
......................................................................


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/3312/7/tests/custom_cluster/test_breakpad.py
File tests/custom_cluster/test_breakpad.py:

Line 117:   def wait_for_num_processes(self, daemon, num_expected, timeout=60):
This method seems to be doing a bit more than it needs to, which I'm mainly 
deriving from the fact that it raises two different kinds of exceptions (a 
RuntimeException, and an AssertionError) for essentially the same reason -- 
finding the wrong number of processes.

Since you're already calling calling this method within the context of an 
assert statement (down in your test function) I would consider simplifying 
wait_for_num_processes() so that it merely returns the number of processes it 
finds after trying for the allotted timeout. Then you could give the *caller* 
the responsibility of deciding what to do with that information, e.g., in your 
test, you could do something like:


   assert self.wait_for_num_processes(daemon='impalad', 
num_expected=cluster_size) == cluster_size


Also, for what it's worth, this solution kind of misses the intent of my 
original comment/question, since it basically still just introduces a static 
sleep time (although it does have the possibility of returning sooner if the 
terminating condition of the while loop is met.)

My original question (which no doubt stems from my inexperience with all of the 
testing permutations here) was really more about finding out whether a single 
static timeout would be valid in all possible test scenarios. In other words, 
will we be running this test on varying-sized clusters, or clusters that might 
be deployed in the cloud, versus locally, versus as a mini-cluster on a single 
machine? Does the size of the cluster or where it's deployed even matter? I 
didn't know.

For what it's worth, that aspect could be addressed easily by making the 
default timeout a factor of cluster_size if None is given, e.g.:


  def wait_for_num_processes(self, daemon, num_expected, timeout=None):
      if timeout is None:
          timeout = 5 * num_expected
      end = time.time() + timeout
      etc...


...or something similar.

I'm sorry if this all became more complicated than it deserved to be.


PS7, Line 173: 3
This could still be replaced with 'cluster_size', right?


-- 
To view, visit http://gerrit.cloudera.org:8080/3312
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I40149e48e391451de21a5c8bda18e2307fc89513
Gerrit-PatchSet: 7
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Lars Volker <[email protected]>
Gerrit-Reviewer: David Knupp <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to