so, things look like they've stabilized significantly over the past 10 days, and without any changes on our end: <snip> $ /root/tools/get_timeouts.sh 10 timeouts by date: 2014-10-14 -- 2 2014-10-16 -- 1 2014-10-19 -- 1 2014-10-20 -- 2 2014-10-23 -- 5
timeouts by project: 5 NewSparkPullRequestBuilder 5 SparkPullRequestBuilder 1 Tachyon-Pull-Request-Builder total builds (excepting aborted by a user): 602 total percentage of builds timing out: 01 </snip> the NewSparkPullRequestBuilder failures are spread over five different days (10-14 through 10-20), and the SparkPullRequestBuilder failures all happened yesterday. there were a LOT of SparkPullRequestBuilder builds yesterday (60), and the failures happened during these hours (first number == number of builds failed, second number == hour of the day): <snip> $ cat timeouts-102414-130817 | grep SparkPullRequestBuilder | grep 2014-10-23 | awk '{print$3}' | awk -F":" '{print$1'} | sort | uniq -c 1 03 2 20 1 22 1 23 </snip> however, the number of total SparkPullRequestBuilder builds during these times don't seem egregious: <snip> 4 03 9 20 4 22 9 23 </snip> nor does the total for ALL builds at those times: <snip> 5 03 9 20 7 22 11 23 </snip> 9 builds was the largest number of SparkPullRequestBuilder builds per hour, but there were other hours with 5, 6 or 7 builds/hour that didn't have a timeout issue. in fact, hour 16 (4pm) had the most builds running total yesterday, which includes 7 SparkPullRequestBuilder builds, and nothing timed out. most of the pull request builder hits on github are authenticated w/an oauth token. this gives us 5000 hits/hour, and unauthed gives us 60/hour. in conclusion: there is no way are we hitting github often enough to be rate limited. i think i've finally ruled that out completely. :)