so, things look like they've stabilized significantly over the past 10
days, and without any changes on our end:
<snip>
$ /root/tools/get_timeouts.sh 10
timeouts by date:
2014-10-14 -- 2
2014-10-16 -- 1
2014-10-19 -- 1
2014-10-20 -- 2
2014-10-23 -- 5

timeouts by project:
      5 NewSparkPullRequestBuilder
      5 SparkPullRequestBuilder
      1 Tachyon-Pull-Request-Builder
total builds (excepting aborted by a user):
602

total percentage of builds timing out:
01
</snip>

the NewSparkPullRequestBuilder failures are spread over five different days
(10-14 through 10-20), and the SparkPullRequestBuilder failures all
happened yesterday.  there were a LOT of SparkPullRequestBuilder builds
yesterday (60), and the failures happened during these hours (first number
== number of builds failed, second number == hour of the day):
<snip>
$ cat timeouts-102414-130817 | grep SparkPullRequestBuilder | grep
2014-10-23 | awk '{print$3}' | awk -F":" '{print$1'} | sort | uniq -c
      1 03
      2 20
      1 22
      1 23
</snip>

however, the number of total SparkPullRequestBuilder builds during these
times don't seem egregious:
<snip>
      4 03
      9 20
      4 22
      9 23
</snip>

nor does the total for ALL builds at those times:
<snip>
      5 03
      9 20
      7 22
     11 23
</snip>

9 builds was the largest number of SparkPullRequestBuilder builds per hour,
but there were other hours with 5, 6 or 7 builds/hour that didn't have a
timeout issue.

in fact, hour 16 (4pm) had the most builds running total yesterday, which
includes 7 SparkPullRequestBuilder builds, and nothing timed out.

most of the pull request builder hits on github are authenticated w/an
oauth token.  this gives us 5000 hits/hour, and unauthed gives us 60/hour.

in conclusion:  there is no way are we hitting github often enough to be
rate limited.  i think i've finally ruled that out completely.  :)

Reply via email to