One finding is that all the timeout happened with this command:

git fetch --tags --progress https://github.com/apache/spark.git
+refs/pull/*:refs/remotes/origin/pr/*

I'm thinking that maybe this may be a expensive call, we could try to
use a more cheap one:

git fetch --tags --progress https://github.com/apache/spark.git
+refs/pull/XXX/*:refs/remotes/origin/pr/XXX/*

XXX is the PullRequestID,

The configuration support parameters [1], so we could put this in :

+refs/pull//${ghprbPullId}/*:refs/remotes/origin/pr/${ghprbPullId}/*

I have not tested this yet, could you give this a try?

Davies


[1] 
https://wiki.jenkins-ci.org/display/JENKINS/GitHub+pull+request+builder+plugin

On Fri, Oct 17, 2014 at 5:00 PM, shane knapp <skn...@berkeley.edu> wrote:
> actually, nvm, you have to be run that command from our servers to affect
> our limit.  run it all you want from your own machines!  :P
>
> On Fri, Oct 17, 2014 at 4:59 PM, shane knapp <skn...@berkeley.edu> wrote:
>
>> yep, and i will tell you guys ONLY if you promise to NOT try this
>> yourselves...  checking the rate limit also counts as a hit and increments
>> our numbers:
>>
>> # curl -i https://api.github.com/users/whatever 2> /dev/null | egrep
>> ^X-Rate
>> X-RateLimit-Limit: 60
>> X-RateLimit-Remaining: 51
>> X-RateLimit-Reset: 1413590269
>>
>> (yes, that is the exact url that they recommended on the github site lol)
>>
>> so, earlier today, we had a spark build fail w/a git timeout at 10:57am,
>> but there were only ~7 builds run that hour, so that points to us NOT
>> hitting the rate limit...  at least for this fail.  whee!
>>
>> is it beer-thirty yet?
>>
>> shane
>>
>>
>>
>> On Fri, Oct 17, 2014 at 4:52 PM, Nicholas Chammas <
>> nicholas.cham...@gmail.com> wrote:
>>
>>> Wow, thanks for this deep dive Shane. Is there a way to check if we are
>>> getting hit by rate limiting directly, or do we need to contact GitHub
>>> for that?
>>>
>>> 2014년 10월 17일 금요일, shane knapp<skn...@berkeley.edu>님이 작성한 메시지:
>>>
>>> quick update:
>>>>
>>>> here are some stats i scraped over the past week of ALL pull request
>>>> builder projects and timeout failures.  due to the large number of spark
>>>> ghprb jobs, i don't have great records earlier than oct 7th.  the data is
>>>> current up until ~230pm today:
>>>>
>>>> spark and new spark ghprb total builds vs git fetch timeouts:
>>>> $ for x in 10-{09..17}; do passed=$(grep $x SORTED.passed | grep -i
>>>> spark | wc -l); failed=$(grep $x SORTED | grep -i spark | wc -l); let
>>>> total=passed+failed; fail_percent=$(echo "scale=2; $failed/$total" | bc |
>>>> sed "s/^\.//g"); line="$x -- total builds: $total\tp/f:
>>>>  $passed/$failed\tfail%: $fail_percent%"; echo -e $line; done
>>>> 10-09 -- total builds: 140 p/f: 92/48 fail%: 34%
>>>> 10-10 -- total builds: 65 p/f: 59/6 fail%: 09%
>>>> 10-11 -- total builds: 29 p/f: 29/0 fail%: 0%
>>>> 10-12 -- total builds: 24 p/f: 21/3 fail%: 12%
>>>> 10-13 -- total builds: 39 p/f: 35/4 fail%: 10%
>>>> 10-14 -- total builds: 7 p/f: 5/2 fail%: 28%
>>>> 10-15 -- total builds: 37 p/f: 34/3 fail%: 08%
>>>> 10-16 -- total builds: 71 p/f: 59/12 fail%: 16%
>>>> 10-17 -- total builds: 26 p/f: 20/6 fail%: 23%
>>>>
>>>> all other ghprb builds vs git fetch timeouts:
>>>> $ for x in 10-{09..17}; do passed=$(grep $x SORTED.passed | grep -vi
>>>> spark | wc -l); failed=$(grep $x SORTED | grep -vi spark | wc -l); let
>>>> total=passed+failed; fail_percent=$(echo "scale=2; $failed/$total" | bc |
>>>> sed "s/^\.//g"); line="$x -- total builds: $total\tp/f:
>>>>  $passed/$failed\tfail%: $fail_percent%"; echo -e $line; done
>>>> 10-09 -- total builds: 16 p/f: 16/0 fail%: 0%
>>>> 10-10 -- total builds: 46 p/f: 40/6 fail%: 13%
>>>> 10-11 -- total builds: 4 p/f: 4/0 fail%: 0%
>>>> 10-12 -- total builds: 2 p/f: 2/0 fail%: 0%
>>>> 10-13 -- total builds: 2 p/f: 2/0 fail%: 0%
>>>> 10-14 -- total builds: 10 p/f: 10/0 fail%: 0%
>>>> 10-15 -- total builds: 5 p/f: 5/0 fail%: 0%
>>>> 10-16 -- total builds: 5 p/f: 5/0 fail%: 0%
>>>> 10-17 -- total builds: 0 p/f: 0/0 fail%: 0%
>>>>
>>>> note:  the 15th was the day i rolled back to the earlier version of the
>>>> git plugin.  it doesn't seem to have helped much, so i'll probably bring us
>>>> back up to the latest version soon.
>>>> also note:  rocking some floating point math on the CLI!  ;)
>>>>
>>>> i also compared the distribution of git timeout failures vs time of day,
>>>> and there appears to be no correlation.  the failures are pretty evenly
>>>> distributed over each hour of the day.
>>>>
>>>> we could be hitting the rate limit due to the ghprb hitting github a
>>>> couple of times for each build, but we're averaging ~10-20 builds per hour
>>>> (a build hits github 2-4 times, from what i can tell).  i'll have to look
>>>> more in to this on monday, but suffice to say we may need to move from
>>>> unauthorized https fetches to authorized requests.  this means retrofitting
>>>> all of our jobs.  yay!  fun!  :)
>>>>
>>>> another option is to have local mirrors of all of the repos.  the
>>>> problem w/this is that there might be a window where changes haven't made
>>>> it to the local mirror and tests run against it.  more fun stuff to think
>>>> about...
>>>>
>>>> now that i have some stats, and a list of all of the times/dates of the
>>>> failures, i will be drafting my email to github and firing that off later
>>>> today or first thing monday.
>>>>
>>>> have a great weekend everyone!
>>>>
>>>> shane, who spent way too much time on the CLI and is ready for some beer.
>>>>
>>>> On Thu, Oct 16, 2014 at 1:04 PM, Nicholas Chammas <
>>>> nicholas.cham...@gmail.com> wrote:
>>>>
>>>>> On Thu, Oct 16, 2014 at 3:55 PM, shane knapp <skn...@berkeley.edu>
>>>>> wrote:
>>>>>
>>>>>> i really, truly hate non-deterministic failures.
>>>>>
>>>>>
>>>>> Amen bruddah.
>>>>>
>>>>
>>>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to