quick update:

here are some stats i scraped over the past week of ALL pull request
builder projects and timeout failures.  due to the large number of spark
ghprb jobs, i don't have great records earlier than oct 7th.  the data is
current up until ~230pm today:

spark and new spark ghprb total builds vs git fetch timeouts:
$ for x in 10-{09..17}; do passed=$(grep $x SORTED.passed | grep -i spark |
wc -l); failed=$(grep $x SORTED | grep -i spark | wc -l); let
total=passed+failed; fail_percent=$(echo "scale=2; $failed/$total" | bc |
sed "s/^\.//g"); line="$x -- total builds: $total\tp/f:
 $passed/$failed\tfail%: $fail_percent%"; echo -e $line; done
10-09 -- total builds: 140 p/f: 92/48 fail%: 34%
10-10 -- total builds: 65 p/f: 59/6 fail%: 09%
10-11 -- total builds: 29 p/f: 29/0 fail%: 0%
10-12 -- total builds: 24 p/f: 21/3 fail%: 12%
10-13 -- total builds: 39 p/f: 35/4 fail%: 10%
10-14 -- total builds: 7 p/f: 5/2 fail%: 28%
10-15 -- total builds: 37 p/f: 34/3 fail%: 08%
10-16 -- total builds: 71 p/f: 59/12 fail%: 16%
10-17 -- total builds: 26 p/f: 20/6 fail%: 23%

all other ghprb builds vs git fetch timeouts:
$ for x in 10-{09..17}; do passed=$(grep $x SORTED.passed | grep -vi spark
| wc -l); failed=$(grep $x SORTED | grep -vi spark | wc -l); let
total=passed+failed; fail_percent=$(echo "scale=2; $failed/$total" | bc |
sed "s/^\.//g"); line="$x -- total builds: $total\tp/f:
 $passed/$failed\tfail%: $fail_percent%"; echo -e $line; done
10-09 -- total builds: 16 p/f: 16/0 fail%: 0%
10-10 -- total builds: 46 p/f: 40/6 fail%: 13%
10-11 -- total builds: 4 p/f: 4/0 fail%: 0%
10-12 -- total builds: 2 p/f: 2/0 fail%: 0%
10-13 -- total builds: 2 p/f: 2/0 fail%: 0%
10-14 -- total builds: 10 p/f: 10/0 fail%: 0%
10-15 -- total builds: 5 p/f: 5/0 fail%: 0%
10-16 -- total builds: 5 p/f: 5/0 fail%: 0%
10-17 -- total builds: 0 p/f: 0/0 fail%: 0%

note:  the 15th was the day i rolled back to the earlier version of the git
plugin.  it doesn't seem to have helped much, so i'll probably bring us
back up to the latest version soon.
also note:  rocking some floating point math on the CLI!  ;)

i also compared the distribution of git timeout failures vs time of day,
and there appears to be no correlation.  the failures are pretty evenly
distributed over each hour of the day.

we could be hitting the rate limit due to the ghprb hitting github a couple
of times for each build, but we're averaging ~10-20 builds per hour (a
build hits github 2-4 times, from what i can tell).  i'll have to look more
in to this on monday, but suffice to say we may need to move from
unauthorized https fetches to authorized requests.  this means retrofitting
all of our jobs.  yay!  fun!  :)

another option is to have local mirrors of all of the repos.  the problem
w/this is that there might be a window where changes haven't made it to the
local mirror and tests run against it.  more fun stuff to think about...

now that i have some stats, and a list of all of the times/dates of the
failures, i will be drafting my email to github and firing that off later
today or first thing monday.

have a great weekend everyone!

shane, who spent way too much time on the CLI and is ready for some beer.

On Thu, Oct 16, 2014 at 1:04 PM, Nicholas Chammas <
nicholas.cham...@gmail.com> wrote:

> On Thu, Oct 16, 2014 at 3:55 PM, shane knapp <skn...@berkeley.edu> wrote:
>
>> i really, truly hate non-deterministic failures.
>
>
> Amen bruddah.
>

Reply via email to