Hello folks!
For some years now, I've been the main or only point of contact for the
Wiki project sql/xml dumps semimonthly, as well as for a number of
miscellaneous weekly datasets.
This work is now passing to Data Platform Engineering (DPE), and your new
points of contact, starting right away,
for a single Wikipedia page? The JSON structure looks very
> useful by itself (e.g., not in bulk).
>
>
> Mitar
>
>
> On Tue, Oct 19, 2021 at 4:57 PM Ariel Glenn WMF
> wrote:
> >
> > I am pleased to announce that Wikimedia Enterprise's HTML dumps [1] for
> &
I am pleased to announce that Wikimedia Enterprise's HTML dumps [1] for
October 17-18th are available for public download; see
https://dumps.wikimedia.org/other/enterprise_html/ for more information. We
expect to make updated versions of these files available around the 1st/2nd
of the month and
Thanks to BringYour, based in California, for volunteering to host the last
5 good xml/sql dumps!
To check out the full list of mirrors, see either
https://dumps.wikimedia.org/mirrors.html or
https://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps#Dumps
Interested in hosting dumps
I'd like to see third party users, even those not on the mailing list, get
advance notice in one release (say in the release notes) so that when the
next release shows up with the deprecated code removed, they have had time
to patch up any internal extensions and code they may have.
I don't want
Good morning!
New weekly dumps are available [1], containing the content of the tables
used by the MachineVision extension [2]. For information about these
tables, please see [3].
If you decide to use these tables, as with any other dumps, I would be
interested to know how you use them; feel
As mentioned earlier on the xmldatadumps-l, the dumps are running very slow
this month, ince the vslow db hosts they use are also serving live traffic
during a tables migration. Even manual runs of partial jobs would not help
the situation any, so there will be NO SECOND DUMP RUN THIS MONTH. The
We plan to move to the new schema for xml dumps for the February 1, 2020
run. Update your scripts and apps accordingly!
The new schema contains an entry for each 'slot' of content. This means
that, for example, the commonswiki dump will contain MediaInfo information
as well as the usual wikitext.
Wikidata surpassed the English language Wikipedia in the number of
revisions in the database, about 45 minutes ago today.I was tipped off by a
tweet [1] a few day ago and have been watching via a script that displays
the largest revision id and its timestamp. Here's the point where Wikidata
If you use these dumps regularly, please read and weigh in here:
https://phabricator.wikimedia.org/T216160
Thanks in advance,
Ariel Glenn
Wikimedia Foundation
ar...@wikimedia.org
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
Hey folks,
We've had a request to reschedule the way the various wikidata entity dumps
are run. Right now they go once a week on set days of the week; we've been
asked about pegging them to specific days of the month, rather as the
xml/sql dumps are run. See
I am happy to announce a new mirror site, located in Canada, which is
hosting the last two good dumps of all projects. Please welcome and put to
good use https://dumps.wikimedia.freemirror.org/ !
I want to thank Adam for volunteering bandwidth and space and for getting
everything set up. More
In the meantime, I would encourage those who have not looked at the Git
Reviewer Bot page in a while, to do so and to add any updates.
Ariel
On Fri, Jan 18, 2019 at 4:12 PM Tyler Cipriani
wrote:
> Hi all,
>
> Gerrit no longer automatically adds reviewers[0]. Unfortunately, this
> plugin
?
> Anyway, I am proud of being part of this. :-)
>
> 2018-08-20 12:26 GMT+02:00 Ariel Glenn WMF :
>
> > Starting September 1, huwiki and arwiki, which both take several days to
> > complete the revsion history content dumps, will be moved to the 'big
> > wikis' list, mea
Starting September 1, huwiki and arwiki, which both take several days to
complete the revsion history content dumps, will be moved to the 'big
wikis' list, meaning that they will run jobs in parallel as do frwiki,
ptwiki and others now, for a speedup.
Please update your scripts accordingly.
As many of you may know, MultiContent Revisions are coming soon (October?)
to a wiki near you. This means that we need changes to the XML dumps
schema; these changes will likely NOT be backwards compatible.
Initial discussion will take place here:
https://phabricator.wikimedia.org/T199121
For
Good morning!
The pages-meta-history dumps for hewiki take 70 hours these days, the
longest of any wiki not already running with parallel jobs. I plan to add
it to the list of 'big wikis' starting August 1st, meaning that 6 jobs will
run in parallel producing the usual numbered file output; look
Hello folks,
Terbium, our former faithful MediaWiki maintenance server, will be up for
decommissioning on Monday, July 9th. It is no longer used for anything in
production as of a few moments ago. The sole exception to that is cron jobs
that were already running and have not yet completed. Please
TL;DR:
Scripts that reply on xml files numbered 1 through 4 should be updated to
check for 1 through 6.
Explanation:
A number of wikis have stubs and page content files generated 4 parts at a
time, with the appropriate number added to the filename. I'm going to be
increasing that thi month to 6.
s-ez sets disappeared from
> dumps.wikimedia.org starting this date. Is that a coincidence ?
> Is it https://phabricator.wikimedia.org/T189283 perhaps ?
>
> DJ
>
> On Thu, Mar 29, 2018 at 2:42 PM, Ariel Glenn WMF <ar...@wikimedia.org>
> wrote:
> > Here it co
Folks,
As you'll have seen from previous email, we are now using a new beefier
webserver for your dataset downloading needs. And the old server is going
away on TUESDAY April 10th.
This means that if you are using 'dataset1001.wikimedia.org' or the IP
address itself in your scripts, you MUST
Those of you that rely on the abstracts dumps will have noticed that the
content for wikidata is pretty much useless. It doesn't look like a
summary of the page because main namespace articles on wikidata aren't
paragraphs of text. And there's really no useful summary to be generated,
even if we
dumps.
Please forward wherever you deem appropriate. For further updates, don't
forget to check the Phab ticket! https://phabricator.wikimedia.org/T179059
On Mon, Mar 19, 2018 at 2:00 PM, Ariel Glenn WMF <ar...@wikimedia.org>
wrote:
> A reprieve! Code's not ready and I need to do so
A reprieve! Code's not ready and I need to do some timing tests, so the
March 20th run will do the standard recombining.
For updates, don't forget to check the Phab ticket!
https://phabricator.wikimedia.org/T179059
On Mon, Mar 5, 2018 at 1:10 PM, Ariel Glenn WMF <ar...@wikimedia.org>
We'll probably start at 20GB, which means that WIkidata will be the only
wiki affected for now.
Ariel
On Mon, Mar 5, 2018 at 1:40 PM, Bináris <wikipo...@gmail.com> wrote:
> Could you please translate "too large" to megabytes?
>
> 2018-03-05 12:10 GMT+01:00 Ariel Glenn
Please forward wherever you think appropriate.
For some time we have provided multiple numbered pages-articles bz2 file
for large wikis, as well as a single file with all of the contents combined
into one. This is consuming enough time for Wikidata that it is no longer
sustainable. For wikis
like page-articles are still missing
> from the 20171103 dump directory, when usually it only takes a day...
>
> Nico
>
> On Mon, Nov 6, 2017 at 8:01 PM, Ariel Glenn WMF <ar...@wikimedia.org>
> wrote:
>
> > Rsync of xml/sql dumps to the web server is now running on a
Rsync of xml/sql dumps to the web server is now running on a rolling basis
via a script, so you should see updates regularly rather than "every
$random hours". There's more to be done on that front, see
https://phabricator.wikimedia.org/T179857 for what's next.
Ariel
that some index.html files may contain links to
files which did not get picked up on the rsync. They'll be there sometime
tomorrow after the next rsync.
Ariel
On Mon, Oct 30, 2017 at 5:39 PM, Ariel Glenn WMF <ar...@wikimedia.org>
wrote:
> As was previously announced on the xmldatadumps-l list
As was previously announced on the xmldatadumps-l list, the sql/xml dumps
generated twice a month will be written to an internal server, starting
with the November run. This is in part to reduce load on the web/rsync/nfs
server which has been doing this work also until now. We want separation
of
Hi Trung,
For larger wikis, there will be a collection of partial files such as
these, where the pXXXpXXX indicate the first and last page ids in the
file. But for pages-articles, there will also be a combined file
generated, so you'll be able to download that directly. It's listed on the
I'm happy to announce that the Academic Computer Club of Umeå University in
Sweden is now offering for download the last 5 XML/sql dumps, as well as a
mirror of 'other' datasets. Check the current mirror list [1] for more
information, or go directly to download:
That should be Tuesday, Nov 15. It's been a long week.
A.
On Mon, Nov 14, 2016 at 2:27 PM, Ariel Glenn WMF <ar...@wikimedia.org>
wrote:
> On Tuesday Nov 13, at 9 am UTC, the web server for the dumps and other
> datasets will
> be unavailable due to maintenance. This should take
On Saturday Oct 29, at 8 am UTC, the web server for the dumps and other
datasets will be unavailable due to maintenance. This should take no
longer than 10 minutes. Thanks for your understanding.
Ariel
___
Wikitech-l mailing list
On Mon, Oct 17, 2016 at 11:02 PM, Chad wrote:
> On Mon, Oct 17, 2016 at 5:14 AM Adam Wight wrote:
>
> > The challenges are first that it's based on a Tomcat backend
> > <
> > https://github.com/Wikimedia-TW/han3_ji7_tsoo1_kian3_WM/
>
(off topic) Paladox, for some reason google seriously disliked your last 2
emails, just so you know. (Big read warning banner, etc.)
Ariel
On Mon, Sep 26, 2016 at 6:01 PM, Bináris wrote:
> 2016-09-26 16:54 GMT+02:00 Paladox :
>
> > What does
Hi Binaris,
We actually have better hardware than 4 years ago [0]. However, we have
more projects with more content than 4 years ago. Wikidata did not exist
in 2011; today it has almost 1/2 the revisions of the English language
Wikipedia. The English language Wikipedia itself has increased 51%
mited, but
is continually growing", as email from our contact at that mirror says.
For folks from specific institutions that suddenly no longer have access, I
can forward instution names along and hope that helps.
Ariel
On Wed, May 4, 2016 at 3:33 PM, Ariel Glenn WMF <ar...@wikimedia.org&g
I'm happy to announce a new mirror for datasets other than the XML dumps.
This mirror comes to us courtesy of the Center for Research Computing,
University of Notre Dame, and covers everything "other" [1] which includes
such goodies as Wikidata entity dumps, pageview counts, titles of all files
on
This is now live, if a few days later than expected.
Ariel
On Fri, Apr 1, 2016 at 6:11 PM, Ariel Glenn WMF <ar...@wikimedia.org> wrote:
> This is part of a longstanding general plan to move to https for our
> services. You can track (most of) those items h
Don't laugh, but I actually looked for the like button after reading this
post (too much time on Twitter). I would like to see more of these
initiatives, whatever form they might take. We have something that made a
difference, let's build on that.
Ariel
On Sun, Apr 3, 2016 at 7:02 PM, Risker
<benap...@gmail.com> wrote:
> Can you give us some justification for this change? It's not like when
> downloading dumps you would actually leak some sensitive data...
>
> On Fri, Apr 1, 2016 at 1:03 PM, Ariel Glenn WMF <ar...@wikimedia.org>
> wrote:
> > We pl
We plan to make this change on April 4 (this coming Monday), redirecting
plain http access to https.
A reminder that our dumps can also be found on our mirror sites, for those
who may have restricted https access.
Ariel Glenn
___
Wikitech-l mailing
This upgrade has concluded successfully and all services are again
operational.
Ariel
On Thu, Mar 3, 2016 at 8:15 PM, Ariel Glenn WMF <ar...@wikimedia.org> wrote:
> Fallback is: cable up the old 1GB nic (Chris has done this and set up the
> port), PXE install on that, move to 1
2, 2016 at 8:47 PM, Ariel Glenn WMF <ar...@wikimedia.org> wrote:
> PXE boot from non-embedded nic failed spectacularly despite our best
> efforts. This means we'll have to schedule another window once we have
> someting new to try. I apologize for the extra inconvenience. All serv
Glenn WMF <ar...@wikimedia.org> wrote:
> Extending this downtime window because we ran into unexpected issues with
> PXE boot.
>
> On Tue, Mar 1, 2016 at 3:53 PM, Ariel Glenn WMF <ar...@wikimedia.org>
> wrote:
>
>> Dataset1001, the host which serves dumps
Extending this downtime window because we ran into unexpected issues with
PXE boot.
On Tue, Mar 1, 2016 at 3:53 PM, Ariel Glenn WMF <ar...@wikimedia.org> wrote:
> Dataset1001, the host which serves dumps and other datasets to the public,
> as well as providing access to various datas
Dataset1001, the host which serves dumps and other datasets to the public,
as well as providing access to various datasets directly on stats100x, will
be unavailable tomorrow for an upgrade to jessie. While I don't expect to
need nearly 3 hours for the upgrade, better safe than sorry. In the
That would be me; I need to push some changes through for this month but I
was either travelling or dev summit/allstaff. I'm pretty jetlagged but
I'll likely be doing that tonight, given I woke up at 5 pm :-D
A.
On Mon, Jan 11, 2016 at 4:20 PM, Bernardo Sulzbach <
mafagafogiga...@gmail.com>
49 matches
Mail list logo