so, things look like they've stabilized significantly over the past 10
days, and without any changes on our end:
<snip>
$ /root/tools/get_timeouts.sh 10
timeouts by date:
2014-10-14 -- 2
2014-10-16 -- 1
2014-10-19 -- 1
2014-10-20 -- 2
2014-10-23 -- 5
timeouts by project:
5 NewSparkPullRequestBuilder
5 SparkPullRequestBuilder
1 Tachyon-Pull-Request-Builder
total builds (excepting aborted by a user):
602
total percentage of builds timing out:
01
</snip>
the NewSparkPullRequestBuilder failures are spread over five different days
(10-14 through 10-20), and the SparkPullRequestBuilder failures all
happened yesterday. there were a LOT of SparkPullRequestBuilder builds
yesterday (60), and the failures happened during these hours (first number
== number of builds failed, second number == hour of the day):
<snip>
$ cat timeouts-102414-130817 | grep SparkPullRequestBuilder | grep
2014-10-23 | awk '{print$3}' | awk -F":" '{print$1'} | sort | uniq -c
1 03
2 20
1 22
1 23
</snip>
however, the number of total SparkPullRequestBuilder builds during these
times don't seem egregious:
<snip>
4 03
9 20
4 22
9 23
</snip>
nor does the total for ALL builds at those times:
<snip>
5 03
9 20
7 22
11 23
</snip>
9 builds was the largest number of SparkPullRequestBuilder builds per hour,
but there were other hours with 5, 6 or 7 builds/hour that didn't have a
timeout issue.
in fact, hour 16 (4pm) had the most builds running total yesterday, which
includes 7 SparkPullRequestBuilder builds, and nothing timed out.
most of the pull request builder hits on github are authenticated w/an
oauth token. this gives us 5000 hits/hour, and unauthed gives us 60/hour.
in conclusion: there is no way are we hitting github often enough to be
rate limited. i think i've finally ruled that out completely. :)