ArielGlenn has uploaded a new change for review. (
https://gerrit.wikimedia.org/r/348153 )
Change subject: last page range for page content job would sometimes have too
many revs
......................................................................
last page range for page content job would sometimes have too many revs
Now we continue iterating by revcount amounts until we reach the last page
to be dumped.
Change-Id: Id8832d628a49026da9e7f4ea17548ed340e191cd
---
M xmldumps-backup/dumps/pagerange.py
1 file changed, 13 insertions(+), 9 deletions(-)
git pull ssh://gerrit.wikimedia.org:29418/operations/dumps
refs/changes/53/348153/1
diff --git a/xmldumps-backup/dumps/pagerange.py
b/xmldumps-backup/dumps/pagerange.py
index 56e6da6..b21b999 100644
--- a/xmldumps-backup/dumps/pagerange.py
+++ b/xmldumps-backup/dumps/pagerange.py
@@ -203,12 +203,16 @@
estimate = self.qrunner.get_estimate(page_start, page_end)
revs_for_range = self.get_revcount(int(page_start), int(page_end),
estimate)
numjobs = revs_for_range / numrevs + 1
- for jobnum in range(1, numjobs + 1):
- if jobnum == numjobs:
- # last job, don't bother searching. just append up to max page
id
- ranges.append((str(page_start), str(page_end)))
- break
+ jobnum = 1
+ while True:
+ jobnum += 1
numjobs_left = numjobs - jobnum + 1
+ if numjobs_left <= 0:
+ # our initial count was a bit off, and we'll have more jobs
+ # than we thought. just keep passing the same endpoint
+ # and getting ranges until we've gotten up through
+ # the endpoint returned
+ numjobs_left = 1
interval = (page_end - page_start) / numjobs_left + 1
(start, end) = self.get_pagerange(page_start, numrevs,
page_start + interval, prevguess)
@@ -240,10 +244,10 @@
maxtodo = 50000
runstodo = estimate / maxtodo + 1
- # let's say minimum pages per job is 10, that's
+ # let's say minimum pages per job is 1, that's
# quite reasonable (in the case where some pages
# have many many revisions
- step = ((page_end - page_start) / runstodo) + 10
+ step = ((page_end - page_start) / runstodo) + 1
ends = range(page_start, page_end, step)
if ends[-1] != page_end:
@@ -287,8 +291,8 @@
if not interval:
return (page_start, badguess)
- # set 10 pages as an absolute minimum in a query
- if badguess - page_start <= 10:
+ # set 1 page as an absolute minimum in a query
+ if badguess - page_start <= 1:
return (page_start, badguess)
prevguess = badguess
--
To view, visit https://gerrit.wikimedia.org/r/348153
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: Id8832d628a49026da9e7f4ea17548ed340e191cd
Gerrit-PatchSet: 1
Gerrit-Project: operations/dumps
Gerrit-Branch: ariel
Gerrit-Owner: ArielGlenn <[email protected]>
_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits