[Wikitech-l] Fwd: StrikerBot invited you to join GitLab

2022-09-07 Thread Roy Smith
I just got the attached email.  Is this just fishing, or is this actually some 
wikimedia thing for real?

I host spi-tools in github.  I have no plans to move it to gitlab.  Why has 
somebody created it on gitlab for me and invited me to be an owner?



> Begin forwarded message:
> 
> From: Gitlab 
> Subject: StrikerBot invited you to join GitLab
> Date: September 7, 2022 at 6:22:56 PM EDT
> To: r...@panix.com
> Reply-To: Gitlab 
> 
> 
> 
> StrikerBot  invited you to join the 
> toolforge-repos / spi-tools
> project as a owner
> 
> Join now 
> 
> Project details
>  1 member
>  0 issues
>  0 opened merge requests
> What's it about?
> Projects are used to host and collaborate on code, track issues, and 
> continuously build, test, and deploy your app with built-in GitLab CI/CD.
> 
> 
> GitLab is a complete DevOps platform, delivered as a single application, 
> fundamentally changing the way
> Development, Security, and Ops teams collaborate

___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: Outreachy Round 25–call for projects and mentors now open!

2022-09-07 Thread Roy Smith
On Sep 7, 2022, at 4:50 PM, novemling...@gmail.com wrote:
> 
> - FTP programs that aren't WinSCP with "environment -> SFTP -> server -> sudo 
> -u tools.novem-bot /usr/lib/sftp-server" configured appear to the user to 
> work, but create some hard-to-track-down bugs because files have the wrong 
> owner. For example I tried using FileZilla Client before I found the tutorial.

My take on this is that Toolforge is unabashedly a linux environment.  If 
there's some incompatibility with a Windows app, that's not toolforge's 
problem.  I log into toolforge with this alias:

> alias spi-tools-dev='ssh -t dev.toolforge.org tmux new -A -s spi-tools-dev 
> become spi-tools-dev'

As needed, I set up port tunnels with things like:

> alias tunnel='ssh -t dev.toolforge.org  -L 23002:localhost:23002 become 
> spi-experiments'

And likewise I can move files in and out with scp.  I agree that there's a 
learning curve to all this ssh stuff (including the associated key management), 
but hiding that beneath a cPanel veneer just makes it all the more mysterious 
because you're never really sure what's going on.  If you're going to develop 
in a linux environment, invest the time to learn the linux tools.

>  Suggested fix: give a separate login for each tool folder, so that you don't 
> have to sudo


Logging in as a person. then gaining some specific additional set of rights 
with sudo (the "become" utility is really just a thin wrapper around sudo) 
maintains the appropriate distinction between authentication (who you are) and 
authorization (what you're allowed to do).  If each tool had its own login, 
then how would multiple people maintain the tool?  They'd have to share the 
password to the account.  That's not a good plan.


___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: Outreachy Round 25–call for projects and mentors now open!

2022-09-07 Thread Roy Smith
The biggest issues I see are the lack of any good logging, monitoring and 
alerting tools.  Things like icinga, logstash, grafina.  The kind of things 
that are standard for supporting any production system.  I've raised this 
before, so I won't belabor the point here.

And https://phabricator.wikimedia.org/T256426 
 continues to be an every-day pain 
in my side.  The related https://phabricator.wikimedia.org/T127367 
 is triaged as high priority.  It's 
been open for 6-1/2 years.



> On Sep 7, 2022, at 10:17 AM, Slavina Stefanova  
> wrote:
> 
> On a side note, I'd be interested in hearing what you dislike about 
> Toolforge, if you'd like to share. We (the cloud services team) are working 
> on improving Toolforge and don't always get as much feedback, good or bad, as 
> we'd want. 

___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: Outreachy Round 25–call for projects and mentors now open!

2022-09-07 Thread Roy Smith
Just from my personal experience, I see django as a good "batteries included" 
solution that lets you get something up and running quickly because it gives 
you all the pieces in one package.  But I've found that I tend to actually use 
very little of it.

I tend not to use the django database/model stuff.  On my one large-scale 
django project, we used mongodb with mongoengine for the ORM layer.  On 
spi-tools, I'm using redis.

I've totally sworn off django templates in favor of Jinja, even at the cost of 
breaking some of the neat test client tools django supplies.

I kind of like django's middleware system, but in practice I find it a little 
too complicated, mostly because django doesn't provide a good way to pass 
around per-request context.  So you end up shoving your own data into django's 
HttpRequest, which is kind of evil.  Or you use thread local storage, which 
always seems a little sketchy.  Flask at least attacks the problem head-on by 
providing you with an explicit global object to use.  It may be thread locals 
under the covers, but at least it's officially supported.

I like Flask's decorator-based routing better than Django's url.py system.

But, with all that, I've got a few production Django systems under my belt and 
have only just toyed with Flask enough to get a feel for how it works.

I assume the plan is to do this in Toolforge?  There's a few things about 
Toolforge that I bristle at, but it does give a lot of value in the stuff you 
get for free.  I don't see any viable alternative for a small project like this.


> On Sep 7, 2022, at 4:23 AM, Slavina Stefanova  
> wrote:
> 
> I appreciate suggestions on the tech stack we end up going with.

___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: Outreachy Round 25–call for projects and mentors now open!

2022-09-06 Thread Roy Smith
Hi.  I might be interested.  I'm expert in Python, and have some experience 
with Flask.  I'm the author of spi-tools 
, which was done in Django, but I've 
come around to thinking that Flask probably would have been a better choice.

I have minimal front-end / javascript skills, but wouldn't mind getting some 
exposure to Vue.js.


> On Sep 6, 2022, at 5:24 AM, Slavina Stefanova  
> wrote:
> 
> Hello all,
> 
> I am looking for someone to join me as a co-mentor for the Outreachy 2022 
> December cohort. I’m a former Outreachy intern and currently working as a 
> software engineer with the Technical Engagement team at WMF.
> 
> The project is detailed in this Phabricator task[0]. It will be a web app 
> similar to the tool Citation Hunt[1]. If you have some experience developing 
> web applications, this could be a good opportunity to get involved on the 
> mentoring side. You don’t have to know everything, as long as you have 
> sufficient skills in either frontend, backend, or design. New mentors are 
> welcome!
> 
> The time commitment is around 5h/week from December to March, and during the 
> contribution and applicant selection phase in October.
> 
> You are welcome to email me with any questions, or ask them directly in this 
> thread.
> 
> Also, if you aren’t interested yourself but know someone who might be, please 
> spread the word <3
> 
> 
> Thanks,
> Slavina
> 
> [0] https://phabricator.wikimedia.org/T317083 
>  
> [1] https://citationhunt.toolforge.org/  
> 
> 
> --
> Slavina Stefanova (she/her)
> Software Engineer - Technical Engagement
> 
> Wikimedia Foundation
> 
> 
> On Wed, Aug 17, 2022 at 2:46 PM Srishti Sethi  > wrote:
> Hello everyone,
> 
> Wikimedia is participating in the winter edition of this year's Outreachy 
> > [1] (December 
> 2022–March 2023)! The deadline to submit projects on the Outreachy website is 
> September 30th, 2022. We are currently working on a list of interesting 
> project ideas. If you have some ideas for coding or non-coding (design, 
> documentation, translation, outreach, research) projects, share them here: 
>  > [2].
> 
> About the Outreachy program
> 
> Outreachy offers three-month internships to work remotely in Free and Open 
> Source Software (FOSS), coding, and non-coding projects with experienced 
> mentors. These internships run twice a year–from May to August and December 
> to March. Interns are paid a stipend of USD 7000 for the three months of 
> work. Interns often find employment after their internship with Outreachy 
> sponsors or jobs that use the skills they learned during their internship. 
> This program is open to both students and non-students. Outreachy expressly 
> invites the following people to apply:
> * Women (both cis and trans), trans men, and genderqueer people.
> * Anyone who faces under-representation, systematic bias, or discrimination 
> in the technology industry in their country of residence.
> * Residents and nationals of the United States of any gender who are 
> Black/African American, Hispanic/Latinx, Native American/American Indian, 
> Alaska Native, Native Hawaiian, or Pacific Islander.
> 
> See a blog post highlighting the experiences and outcomes of interns who 
> participated in a previous round of Outreachy with Wikimedia 
>   
> >
>  [3]
> 
> Tips for mentors for proposing projects
> 
> * Follow this task description template when you propose a project in 
> Phabricator: 
>  > [4]. Add 
> #Outreachy-Round-25 tag.
> * Project should require an experienced developer ~15 days and a newcomer ~3 
> months to complete.
> * Each project should have at least two mentors, with one of them holding a 
> technical background.
> * Ideally, the project has no tight deadlines, a moderate learning curve, and 
> fewer dependencies on Wikimedia's core infrastructure. Projects addressing 
> the needs of a language community are most welcome.
> * If you don't have an idea in mind and would like to pick one from an 
> existing list, check out these projects: 
>  > [4]
> * To learn more about the roles and responsibilities of mentors, visit our 
> resources on MediaWiki.org:  > [5].
> 
> We look forward to 

[Wikitech-l] Re: [breaking change] Upgrading to Elasticsarch 7.10 — breaking changes for Cloudelastic, API Feature Usage

2022-07-18 Thread Roy Smith
What are the user-visible changes in 7.10?  More specifically, does this 
address the lack of access control I discussed in 
https://wikitech.wikimedia.org/wiki/Help_talk:Toolforge/Elasticsearch?


> On Jul 18, 2022, at 3:23 PM, Mike Pham  wrote:
> 
> Hi all,
> 
> I originally sent this email out on 20 May, 2022, but it seems like it didn’t 
> go out to everybody, unfortunately. 
> 
> We are beginning the process of undeploying API Feature Usage 
>  this week, and apologize for the 
> confusion, and what is now short notice for those of you who were not aware 
> of this announcement previously.
> 
> Best,
> 
> —
> 
> Mike Pham (he/him)
> Sr Product Manager, Search
> Wikimedia Foundation 
> On 20May, 2022 at 10:18:44, Mike Pham (mp...@wikimedia.org 
> ) wrote:
> 
>> Hi all,
>> 
>> The Wikimedia Foundation Search team is currently working on updating our 
>> Elasticsearch version to 7.10  
>> from version 6.8.20. Our goal is to finish this work in the next couple of 
>> months. Being on the latest (license-compatible) version will help ensure 
>> the stability and reliability of our search infrastructure.
>> 
>> This update will include 2 breaking changes:
>> 
>> Cloudelastic will be affected by breaking changes, with more details below. 
>> The interface might also change slightly.
>> https://www.elastic.co/guide/en/elasticsearch/reference/7.17/breaking-changes-7.0.html
>>  
>> 
>> https://www.elastic.co/guide/en/elasticsearch/reference/7.10/migrating-7.10.html#breaking-changes-7.10
>>  
>> 
>>  
>> 
>> The API Feature Usage extension 
>> , whose 
>> functionality is unrelated to search, will no longer be supported. This 
>> low-usage extension is currently implemented in a complicated and brittle 
>> way that depends on Elasticsearch, which creates development drag on the 
>> Search team’s work in order to continuously maintain and upkeep.
>> 
>> While in the short term API Feature Usage will be sunsetted, we recognize 
>> that it’s probably useful for some users, and we encourage others to 
>> continue to support and develop this extension in the longer term, without 
>> its brittle dependency on Elasticsearch.
>> 
>> Though the Elasticsearch upgrade will provide overall net benefits, we 
>> recognize that these breaking changes will unfortunately affect some users, 
>> and appreciate your understanding as we improve our search infrastructure.
>> 
>> Best,
>> Search Platform
>> 
>> —
>> 
>> Mike Pham (he/him)
>> Sr Product Manager, Search
>> Wikimedia Foundation 
>> ___
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org 
> 
> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org 
> 
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/ 
> 
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: How to get Top 1000 contributors list?

2022-06-25 Thread Roy Smith
This should do it:

https://quarry.wmcloud.org/query/65641 

In general, this is an inefficient query because user_editcount isn't indexed.  
Fortunately, tawiki has a relatively small number of users, so it works.  
Running the same query on a much larger wiki, say en, would probably time out.

> On Jun 25, 2022, at 12:07 PM, Shrinivasan T  wrote:
> 
> Hello all,
> 
> I  like to get top 1000 contributors for ta.wikipedia.org 
>  based on their usercontribution metric.
> 
> is there any code this or api?
> 
> Please share.
> 
> Thanks.
> 
> -- 
> Regards,
> T.Shrinivasan
> 
> 
> My Life with GNU/Linux : http://goinggnu.wordpress.com 
> 
> Free E-Magazine on Free Open Source Software in Tamil : http://kaniyam.com 
> 
> 
> Get Free Tamil Ebooks for Android, iOS, Kindle, Computer : 
> http://FreeTamilEbooks.com 
> ___
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: Wikimedia Hackathon: Call for Sessions

2022-04-28 Thread Roy Smith
The last time I tried to set a phab ticket's priority, I was basically told 
that volunteers shouldn't be doing that, i.e. changing priorities of phab 
tickets was reserved to WMF staff.  Some clarity around this would be 
appreciated :-)

On Apr 28, 2022, at 8:47 AM, Derk-Jan Hartman  
wrote:
> 
> Would anyone be interest in a session about triaging tickets ?
> ...
> * Set a priority


___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/


[Wikitech-l] Re: [Cloud] [Cloud-announce] [IMPORTANT] Announcing Toolforge Debian Stretch Grid Engine deprecation

2022-02-16 Thread Roy Smith
From my perspective of a Toolforge user, one of the issues I see is that it's 
often not clear how to map the "friendly command line interface" into concepts 
I already understand about the lower level tools.

For example, the webservice script does some useful stuff.  But, it wasn't 
clear exactly what it was doing, i.e. there was a lot of magic happening.  
While the magic is certainly an integral part of hiding the low-level details, 
it also obfuscates things.  Reading the webservice script wasn't much help; 
it's long and complicated, and mixes grid and k8s functionality in a way that 
further hides what's actually going on.

Anyway, all I'm really asking is that as the docs get written for the "friendly 
command line interface", you also include some explanation of what's happening 
behind the scenes.  For example, maybe have a --verbose option to all the tools 
which makes it print all the back end commands it's executing, so

> webservice --backend=kubernetes python3.7 restart

might print:

> kubectl exec -i -tshell-1645020371 --container main-app -- /bin/bash

And then somebody who already understands kubectl would instantly understand 
what's happening.  It's not hard to guess the basic gist of what it must be 
doing, but having the details confessed eliminates any doubt, enhancing 
comprehension.

As another example, it took me a little bit to figure out that the "become" 
command doesn't do anything more magic than run sudo with a little sanity 
checking wrapped around it.  Fortunately, that script is simple enough that 
once I looked at it, it was obvious what it was doing.  But other parts of the 
"friendly command line interface" are rather more opaque.


> On Feb 15, 2022, at 11:42 AM, Seyram Komla Sapaty  
> wrote:
> 
> One of the most prominent missing features on Kubernetes was a friendly
> command line interface to schedule jobs (like jsub). We've been working
> on that, and have a beta-level interface that you can try today: the
> Toolforge jobs framework [4].

___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: Limit to the number of images in a page?

2021-11-29 Thread Roy Smith
I don't know if this is your issue, but there's been problems in the past with 
pages that had lots of small png or jpg images of national flags.  Replacing 
them with the svg and/or emoji versions solved the problem.  Described in 
https://phabricator.wikimedia.org/T267804




> On Nov 29, 2021, at 8:53 PM, Brian Wolff  wrote:
> 
> This isn't a per page limit but a number of thumbnails per unit time. Wait a 
> little bit and revisit the page and more pictures should load. Eventually all 
> should. As long as nobody purges the image pages, once the image loads once 
> it should always load again in the future.
> 
> The current limits are 70 per 30 second, unless its a certaun "common" size 
> in which case its 700/30 seconds. This is defined in: 
> https://noc.wikimedia.org/conf/InitialiseSettings.php.txt 
> 
> renderfile' => [
>   // 1400 new thumbnails per minute
>   'ip' => [ 700, 30 ],
>   'user' => [ 700, 30 ],
>   ],
>   'renderfile-nonstandard' => [
>   // 140 new thumbnails per minute
>   'ip' => [ 70, 30 ],
>   'user' => [ 70, 30 ],
>   ],
> 
> 
> In practise i dont think people use those "standard" sizes all that more 
> often than other sizes.
> 
> On Monday, November 29, 2021, Strainu  > wrote:
> Hi,
> 
> I have some wikipages with a large number of images (1000+). Those
> pages never load completely, as upload.wikimedia.org 
>  starts returning
> 429 Too many requests after a while.
> 
> This limit does not seem to be documented on mediawiki.org 
> , so I would
> like to know what it the exact value and if there is a way to work
> around it (except for splitting the pages).
> 
> Thanks,
>Strainu
> ___
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org 
> 
> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org 
> 
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/ 
> 
> ___
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: ListFiles special page

2021-10-14 Thread Roy Smith
It's years since I discovered that mysql's utf8 is broken in this way, but I 
can still feel the pain.  What part of "universal" did they not understand?  
The mysql docs more or less say that "utf8" is deprecated, certainly not 
future-proof, and suggest you use utf8mb4. See 
https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-utf8.html 
)

> On Oct 14, 2021, at 1:17 PM, Sergey Dorofeev  wrote:
> 
> Thank you, did not know about it. Real UTF-8 in mysql is utf8mb4, I think it 
> should be used here.
> 
> ---
> Sergey
> 
> 
> Jaime Crespo писал 2021-10-14 18:32:
> 
>> I agree that LOWER doesn't make much sense in binary collation.
>>  
>> Sadly, a utf8 (3-byte UTF-8) conversion may fail for 4-byte characters, so 
>> at the very least it should be utf8mb4 (4-byte UTF-8). I am not so familiar 
>> with ListPager to say if there could be other issues arising from that- 
>> sending a code review would be easier for better context.
>> 
>> On Thu, Oct 14, 2021 at 5:16 PM Sergey Dorofeev > > wrote:
>> Hello,
>> 
>> I have got issue with ListFiles page in mediawiki 1.35.1
>> Filtering worked not very good, was case-sensitive and not always got 
>> text in middle of file name.
>> I looked in DB and saw that img_name column is varbinary, but 
>> pagers/ImageListPager.php tries to do case-insensitive select with 
>> LOWERing both sides of strings. But LOWER does not work for varbinary
>> So I think that following change will be reasonable:
>> 
>> --- ImageListPager.php.orig 2021-10-14 16:31:52.0 +0300
>> +++ ImageListPager.php  2021-10-14 16:00:10.127694733 +0300
>> @@ -90,9 +90,10 @@
>> 
>>  if ( $nt ) {
>>  $dbr = wfGetDB( DB_REPLICA );
>> -   $this->mQueryConds[] = 'LOWER(img_name)' 
>> .
>> +   $this->mQueryConds[] = 
>> 'LOWER(CONVERT(img_name USING utf8))' .
>>  $dbr->buildLike( 
>> $dbr->anyString(),
>> -   strtolower( 
>> $nt->getDBkey() ), $dbr->anyString() );
>> +   mb_strtolower( 
>> $nt->getDBkey() ), $dbr->anyString() );
>> +
>>  }
>>  }
>> 
>> @@ -161,9 +162,9 @@
>>  $nt = Title::newFromText( $this->mSearch );
>>  if ( $nt ) {
>>  $dbr = wfGetDB( DB_REPLICA );
>> -   $conds[] = 'LOWER(' . $prefix . '_name)' 
>> .
>> +   $conds[] = 'LOWER(CONVERT(' . $prefix . 
>> '_name USING utf8))' .
>>  $dbr->buildLike( 
>> $dbr->anyString(),
>> -   strtolower( 
>> $nt->getDBkey() ), $dbr->anyString() );
>> +   mb_strtolower( 
>> $nt->getDBkey() ), $dbr->anyString() );
>>  }
>>  }
>> 
>> 
>> 
>> -- 
>> Sergey
>> ___
>> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org 
>> 
>> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org 
>> 
>> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/ 
>> 
>> 
>> -- 
>> Jaime Crespo
>> >
>> 
>> ___
>> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org 
>> 
>> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org 
>> 
>> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/ 
>> ___
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: Goto for microoptimisation

2021-07-31 Thread Roy Smith
https://xkcd.com/292/ 

___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: Stream of recent changes diffs

2021-07-08 Thread Roy Smith
On that topic, I'll share some of my experience.

First, parsing wikitext is way more difficult than you probably imagine.  
People are often tempted to do a poor-man's job of it with regular expressions 
and the like.  Down that path lies madness.  Don't go there.

There's only two rational ways I know of to parse wikitext.

Parsoid is one.  It's complicated to get your head around, but it is the one 
true officially supported way.

The other is mwparserfromhell .  It 
has the advantage of being much simpler to use.  It has the disadvantage of not 
getting every possible edge case correct.  It also is only usable in Python, 
which is fine if you're using Python and a problem otherwise.

In either case, once you've got parsed versions of two revisions, you'll then 
be faced with the problem of diffing them.  That's going to be non-trivial.


> On Jul 8, 2021, at 7:01 PM, David Lynch  wrote:
> 
> The best I can say about this for your purposes is that using the parsoid 
> HTML would relieve you of having to parse wikitext to work out whether the 
> contents of a math tag were what changed. 路



___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: Stream of recent changes diffs

2021-07-01 Thread Roy Smith
I'm not sure diffs are going to be useful here.  For example, this diff 

 ostensibly introduces an error in the math markup, but due to the way I've 
formatted the wikisource, it's not obvious from the diff that this is within 
... tags.

You might end up having to do this using the database dumps 
, which is going to entail looking 
at a lot more data (extreme understatement) than the recent changes stream.

> On Jul 1, 2021, at 9:18 AM, Robin Hood  wrote:
> 
> I’m no expert, but I believe the only way to get a diff via the API is 
> throughhttps://www.mediawiki.org/wiki/API:Compare 
> . I haven’t worked with it to any 
> great degree, though, so I’m afraid I can’t help beyond pointing you in that 
> direction.
>  
> From: Physikerwelt mailto:w...@physikerwelt.de>> 
> Sent: July 1, 2021 8:17 AM
> To: Wikimedia developers  >
> Cc: andre.greiner-petter  >; Aaron Halfaker 
> mailto:ahalfa...@wikimedia.org>>
> Subject: [Wikitech-l] Stream of recent changes diffs
>  
> Dear all,
>  
> we have developed a tool that is (in some cases) capable of checking if 
> formulae in -tags in the context of a wikitext fragment are likely to 
> be correct or not. We would like to test the tool on the recent changes. From
>  
> https://www.mediawiki.org/wiki/API:Recent_changes_stream 
> 
>  
> we can get the stream of recent changes. However, I did not find a way to get 
> the diff (either in HTML or Wikitext) to figure out how the content was 
> changed. The only option I see is to request the revision text manually 
> additionally. This would be a few unnecessary requests since most of the 
> changes do not change -tags. I assume that others, i.e., ORES
>  
> https://www.mediawiki.org/wiki/ORES ,
>  
> compute the diffs anyhow and wonder if there is an easier way to get the 
> diffs from the recent changes stream without additional requests.
>  
> All the best
> Physikerwelt (Moritz Schubotz)
>  
>  
> ___
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org 
> 
> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org 
> 
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/ 
> 
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: How to get python 3.6+ at toolforge?

2021-06-19 Thread Roy Smith
When I enquired about this a while ago, I was told that WMF would not install a 
newer python on the existing Debian Stretch hosts because there was no 
pre-built package for it.  I ended up building my own Python 3.7 binary from 
source for use on the debian stretch hosts.  It's actually pretty 
straight-forward to do if you've done that sort of thing before, but it does 
mean you're taking on more responsibility for maintaining it yourself.

The better news is that they do have some Debian Buster bastion hosts up and 
running, which have Python 3.7 installed.  I'm using those now.  I think 
they're not yet released for public use, so I'll let somebody from WMF chime in 
on whether it's OK for others to use, and how to access them.

And, yeah, f-strings is the killer feature of the post 3.5 releases.



> On Jun 19, 2021, at 2:58 PM, QEDK  wrote:
> 
> I don't think so, unless it's upgraded anytime soon (not sure about 
> timelines) or sudo access is provided to you so you can build from source. 
> Getting a hang of k8s in toolforge takes a while but it's easy once you have 
> it set up. Here's a guide on how you can do cronjobs in k8s: 
> https://wikitech.wikimedia.org/wiki/Help:Toolforge/Kubernetes#Kubernetes_cronjobs
>  
> 
> 
> Best,
> QEDK
> 
> On Sun, Jun 20, 2021, 00:02 Shrinivasan T  > wrote:
> 
> You can use the Debian buster images with Kubernetes which have Python 3.7.3 
> pre-installed.
> 
> I need to run the not as a cronjob in toolforge.
> 
> I think I can setup the cron on toolforge server.
> 
> This is plain python script.
> 
> I don't need docker or kubernetes for this.
> 
> Is there any way to get higher version of python at toolforge servers?
> 
> Thanks.
> Shrini
> 
> ___
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org 
> 
> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org 
> 
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/ 
> ___
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: the parts of a template

2021-06-14 Thread Roy Smith
> On Jun 14, 2021, at 10:13 AM, Amir E. Aharoni  
> wrote:
> 
> I'm not talking just about the technical parts, i.e. how the parser sees it, 
> but also about the functional parts—how the template maintainers and users 
> perceive it.

I know this is a bit of a tangent, but since you mentioned parsing, I'd like to 
go off in that direction.  The way parsoid represents a page is a mix of HTML 
and json (RDFa), with the template details being in the json parts.  There are 
good tools for processing HTML documents and searching for specific nodes based 
on the tree structure.  While there are tools for working with RDFa, it's a 
much sparser ecosystem (see https://rdfa.info/tools ). 
 As far as I know, there are no tools that let you do queries like:

Find all the xyz templates with a foo=bar attribute that exist inside a 

because that crosses between the HTML and RDFa domains.  Writing such a query 
is easy with many existing tools that use either XPATH or css selector syntax.___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: [Ops] Mailman2 is now shut down (T52864)

2021-06-02 Thread Roy Smith
+1.  This is a really nice improvement, especially when it comes to searching 
the archives.  Thank you to everybody who helped make this happen.

> On Jun 2, 2021, at 3:37 PM, Martin Urbanec  
> wrote:
> 
> Amir, Kunal and everyone who helped: Thank you for getting Mailman 3 into 
> production, and removing Mailman 2. Your effort is much appreciated.
> 

___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

Re: [Wikitech-l] Image search

2020-11-16 Thread Roy Smith
> On Nov 15, 2020, at 7:18 PM, Lars Aronsson  wrote:
> 
> This is where I want to find more images like these,
> not in the same category, just similar photos.

Perhaps not exactly what was asked, but Google Images has the ability.  Go to 
images.google.com , click the camera icon, and then 
you can paste a URL for the image.  Note, you want the raw image, not the 
commons page, so in your case, 
https://upload.wikimedia.org/wikipedia/commons/8/89/V%C3%A5rdinge_f%C3%B6rsamlingshem.JPG
 


Not surprisingly, in this particular example, it gives you back photos of other 
red farm houses, and doesn't understand that it's the corner lantern that 
you're really interested in.  But still a useful tool to be aware of.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] What is JSON (in JavaScript code)?

2020-10-30 Thread Roy Smith
JSON is Java Script Object Notation.  It's a way of encoding structured data as 
text strings which originated (as in name implies) in javascript, but is now 
widely used as a data exchange format, with support in nearly every programming 
language.  https://www.w3schools.com/js/js_json.asp 


But, in the context you're using it, it's a library of JSON parsing and 
encoding functions built into the javascript implementation on most browsers.  
https://www.w3schools.com/Js/js_json_parse.asp 


If you've opened your browser's console, you should be able to type JSON at it 
and get back something like:

> JSON
> JSON {Symbol(Symbol.toStringTag): "JSON", parse: ƒ, stringify: ƒ}parse: ƒ 
> parse()arguments: (...)caller: (...)length: 2name: "parse"__proto__: ƒ 
> ()[[Scopes]]: Scopes[0]stringify: ƒ stringify()Symbol(Symbol.toStringTag): 
> "JSON"__proto__: Object

If you get something like "JSON is not defined", you're probably running an 
ancient browser.




> On Oct 30, 2020, at 11:05 AM, Strainu  wrote:
> 
> Hi,
> 
> I'm looking at solving the following console warning on ro.wp:
> "JQMIGRATE: jQuery.parseJSON is deprecated; use JSON.parse" which
> appears due to outdated Twinkle code. Just making the replacement does
> not work, since JSON is not defined. As a matter of fact, I cannot
> find it anywhere else in the code loading on a normal Romanian
> Wikipedia page.
> 
> Alas, the generic name of that object makes searching on mw.org or
> Google rather useless. I can see some similar changes in Phabricator,
> but they seem to work.
> 
> So, what is JSON and how can I use it in my code?
> 
> Thanks,
>   Strainu
> 
> P.S. Please don't suggest updating Twinkle...
> 
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> 

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Workflow for updating javascript tools on wiki?

2020-10-26 Thread Roy Smith
I maintain spi-tools.js 
.  The source is in 
github.  At the moment, my "release process" (if you could call it that) is to 
edit 
User:RoySmith/spi-tools.js and copy-paste the new version.  This works, but 
it's clunky.  Is there some pre-existing tool for this?

I could build some little tool to to do this, but if something already exists, 
no need to reinvent the wheel.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Allow HTML email

2020-09-23 Thread Roy Smith
As long as you guys keep the B-News  
bi-directional gateway functioning, I'm fine.  And, please, make sure you strip 
any leading whitespace so as not to tickle the line-eater.

> On Sep 23, 2020, at 9:21 AM, Faidon Liambotis  wrote:
> 
> On Wed, Sep 23, 2020 at 02:45:37PM +1000, Tim Starling wrote:
>> We still haven't heard from Faidon who, last I heard, still reads his
>> emails by piping telnet into less or something. But I think he can
>> make sense of multipart/alternative as long as it's not base-64
>> encoded. You should send the plain text as the first part so he
>> doesn't have to page down too far  ;)
> 
> On behalf of the Mutt & other console email clients user club, we
> approve of this change. We haven't formed consensus on it yet, but I
> suspect we'd even be willing to go one step further and negotiate the
> use of emojis as well (perhaps even emojis in subject lines). No
> promises for responding in HTML, though; that's probably going to have
> to wait another century.
> 
> Faidon
> 
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> 

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l