[Wikitech-l] code review criticism (Re: Converting to Git?)

2011-03-24 Thread Ashar Voultoiz
Neilk wrote:
 At the risk of being impolite -- our code review tool is not that nice.
 (I don't expect that anyone who worked on it would even disagree with me
 here.)

On 24/03/11 06:47, MZMcBride wrote:
  It's only impolite if you criticize the code review tool without being
  constructive. What specifically do you not like about the current code
  review tool? And have you filed bugs about getting these issues
  addressed?

Neilk is realist. Either we bring more developers in the system or we 
drop it and reuse another system already having some developers. For 
example, we are not developing our own bug tracker or webmail 
interfaces. We reuse code from others just like other reuse our Wiki code.

I would name a few issues with our CR system:
- does not known about branches
- lacks a manual merging system
- lacks an automatic merging system (something like: if rev and follow 
up got 4 sign up, merge them all in release branch).
- a rev + its follow up could be grouped. We will then review the group 
as a whole instead of individual revisions.
- I still have not figured out how to filter by author AND path
- comment system should be liquid thread based
- the diff is useless (I use a local tool)
- still have to rely on local tools for merging, reverting, blaming
- not integrated with bugzilla

There are lot of good points though!

-- 
Ashar Voultoiz


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Converting to Git?

2011-03-24 Thread Ashar Voultoiz
On 24/03/11 06:47, MZMcBride wrote:

 It's only impolite if you criticize the code review tool without being
 constructive. What specifically do you not like about the current code
 review tool? And have you filed bugs about getting these issues addressed?

I have answered to this message in a new one to create a new thread.

-- 
Ashar Voultoiz


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] code review criticism (Re: Converting to Git?)

2011-03-24 Thread K. Peachey
On Thu, Mar 24, 2011 at 5:44 PM, Ashar Voultoiz hashar+...@free.fr wrote:
 Neilk is realist. Either we bring more developers in the system or we
 drop it and reuse another system already having some developers.
It's sitting there in SVN, nothing is stopping people from working on
it, In fact Sam and Chad might like the help, But your arugment that
having more developers(/man power) != better working systems.

 I would name a few issues with our CR system:
 
 - I still have not figured out how to filter by author AND path
Have you asked anyone for help? Although I think it may be broken
based on [[Bugzilla:26195]]

 - comment system should be liquid thread based.
There is a bug and plans for this (Pending the LQT backend rewrite)

 - the diff is useless (I use a local tool)
How so? Have you submitted a bug so people know about this?

 - still have to rely on local tools for merging, reverting, blaming
Because those are SVN actions that need to be done as a SVN user and
our SVN - Wiki user system is kinda lacking from my understanding.

 - not integrated with bugzilla
What parts could be improved by having it more intergrated?
-Peachey

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Where exactly are the video and audio players at?

2011-03-24 Thread Joseph Roberts
Hey all,

I've been scanning the source and I can't find where the players are kept.
Can anyone add any insight on this?  Is it done as a hook or direct code?

TIA - Joseph Roberts

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Where exactly are the video and audio players at?

2011-03-24 Thread Bryan Tong Minh
On Thu, Mar 24, 2011 at 12:05 PM, Joseph Roberts
roberts.jos...@ntlworld.com wrote:
 Hey all,

 I've been scanning the source and I can't find where the players are kept.
 Can anyone add any insight on this?  Is it done as a hook or direct code?
Extension:OggHandler. The video player is Cortado, which is bundled
with OggHandler iirc

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Where exactly are the video and audio players at?

2011-03-24 Thread Joseph Roberts
Thanks, would it be preferable to add HTML5 to that or make another
extension purely for using video/audio?

On 24 March 2011 11:16, Bryan Tong Minh bryan.tongm...@gmail.com wrote:
 On Thu, Mar 24, 2011 at 12:05 PM, Joseph Roberts
 roberts.jos...@ntlworld.com wrote:
 Hey all,

 I've been scanning the source and I can't find where the players are kept.
 Can anyone add any insight on this?  Is it done as a hook or direct code?
 Extension:OggHandler. The video player is Cortado, which is bundled
 with OggHandler iirc

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Where exactly are the video and audio players at?

2011-03-24 Thread Joseph Roberts
Actually, looking through OggHandler, I do think that developing a
seperate entity may work well.
I'm not quite sure what is wanted by the general public and would like
to do what is wanted by the majority, not just wat would be easiest or
even the best.
What would be the best way to implement a HTML5 player in MediaWiki?

TIA - Joseph Roberts

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Enable WikiTrust spanish support

2011-03-24 Thread Wilfredor
Yours sincerely,

Has long tried to start a Wikipedia 1.0 project in Spanish
(http://es.wikipedia.org/wiki/Wikipedia:Wikipedia_en_CD). Project
similar to the English version.

The problem is that I have been unable to contact WikiTrust team
(http://www.wikitrust.net/authors). We need the support of the Spanish
system, which does not exist yet.

I apologize in advance if this is not the right place.

Thank you very much.

-- 
User:Wilfredor

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Where exactly are the video and audio players at?

2011-03-24 Thread Joseph Roberts
On 24 March 2011 12:18, Bryan Tong Minh bryan.tongm...@gmail.com wrote:
 On Thu, Mar 24, 2011 at 12:27 PM, Joseph Roberts
 roberts.jos...@ntlworld.com wrote:
 Thanks, would it be preferable to add HTML5 to that or make another
 extension purely for using video/audio?

 OggHandler already implements video. It tries to select an
 appropriate player (video, Cortado or VLC) depending on the user's
 browser.
Ah, cool.  If no one minds, shouldn't [[mw:HTML5]] be editted to
reflect what h in the mainstream?

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] [Selenium] Structured description of tests?

2011-03-24 Thread Benedikt Kaempgen
Hi Markus,

That sounds good. I have added some common tasks/assertions. 

How do you think one could use those in test plans/descriptions?

If you are interested in how we plan/do use WMF Selenium framework, I have
updated SMW Selenium tests documentation [1].

Best,

Benedikt

[1] http://www.semantic-mediawiki.org/wiki/SMW_System_Testing_with_Selenium 
 

--
AIFB, Karlsruhe Institute of Technology (KIT)
Phone: +49 721 608-47946 
Email: benedikt.kaemp...@kit.edu
Web: http://www.aifb.kit.edu/web/Hauptseite/en 



-Original Message-
From: wikitech-l-boun...@lists.wikimedia.org
[mailto:wikitech-l-boun...@lists.wikimedia.org] On Behalf Of Markus Glaser
Sent: Tuesday, March 22, 2011 11:06 AM
To: Wikimedia developers
Subject: Re: [Wikitech-l] [Selenium] Structured description of tests?

Hi Benedict,

one way to make tests more structured and easier to maintain would be to
provide a standard set of operations within the Selenium Framework. A list
of suggestions can already be found at
http://www.mediawiki.org/wiki/SeleniumFramework#Notes_and_further_improvemen
ts. However, this does not seen to be very exhaustive... If you like to, we
could join forces in order to create a usable set of standards, since this
would be the next item on my todo list for the framework, anyway :)

Cheers,
Markus 

-Ursprüngliche Nachricht-
Von: wikitech-l-boun...@lists.wikimedia.org
[mailto:wikitech-l-boun...@lists.wikimedia.org] Im Auftrag von Benedikt
Kaempgen
Gesendet: Montag, 14. März 2011 19:03
An: Wikimedia developers
Betreff: [Wikitech-l] [Selenium] Structured description of tests?

Hello,

As I see from [1-4], test descriptions for MW with the Selenium Framework
are not much structured at the moment. I think, this will make it difficult
to maintain these tests.

Any suggestions how we could improve this?

For a start, a bachelor student of mine will be looking into how to describe
system tests for Semantic MediaWiki (and extensions) using categories and
properties of Semantic MediaWiki. We are planning that tests are derived
from and link to contents in the user/admin manual.

Regards,

Benedikt 

[1] http://www.mediawiki.org/wiki/Cite_Extension_Test_Plan
[2] http://www.mediawiki.org/wiki/ConfirmEdit_Test_Plan
[3] http://www.mediawiki.org/wiki/New_installer/Test_plan
[4]
http://www.mediawiki.org/wiki/Selenium/Deployment#Automation_work_done_by_th
e_Calcey_team

--
AIFB, Karlsruhe Institute of Technology (KIT)
Phone: +49 721 608-47946
Email: benedikt.kaemp...@kit.edu
Web: http://www.aifb.kit.edu/web/Hauptseite/en 




___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] code review criticism (Re: Converting to Git?)

2011-03-24 Thread Chad
On Thu, Mar 24, 2011 at 3:44 AM, Ashar Voultoiz hashar+...@free.fr wrote:
 - I still have not figured out how to filter by author AND path

Special:Code/MediaWiki/author/hashar?path=/trunk/phase3

or if you only want unreviewed revs:

Special:Code/MediaWiki/status/new?author=hasharpath=/trunk/phase3

The UI still sucks for it, but support *is* there.

-Chad

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] code review criticism (Re: Converting to Git?)

2011-03-24 Thread Neil Kandalgaonkar
On 03/24/2011 12:44 AM, Ashar Voultoiz wrote:

 On 24/03/11 06:47, MZMcBride wrote:
   It's only impolite if you criticize the code review tool without being
   constructive. What specifically do you not like about the current code
   review tool? 

I agree with most of what Ashar said. Lack of branching, merging, blame,
only semi-integrated with bug tracking.

 And have you filed bugs about getting these issues
   addressed?

My guess, which could be wrong, is that it would be cleaner to move to a
new tool, or new combination of tools. I'm not sure which yet.

As for Special:Code et al, I disagree with the whole paradigm around
which it is based -- effectively it just annotates revisions in SVN.
This means that you need this outer circle of checkins that don't
really count. And then there's the inner circle where some esteemed
developer who's done a lot of cool things for MediaWiki, gets, as their
reward, to constantly deal with other people's patches that they may not
care about.

I believe that paradigm is broken because the incentives are backwards.
Average developers are frustrated because no matter HOW much energy and
commitment they have, they can't make code review go faster. The inner
circle of production-branch committers are also frustrated because it's
up to them to deal with all the pain of merging. Suddenly getting a
feature production ready is THEIR problem, and sometimes
super-productive people like Roan have found it easier to just rewrite
it from scratch. Otherwise it's much easier to just ignore the backlog
for long periods.

My ideal is a code review tool that:

- Allows us to deploy trunk. At any time. Eliminate the production
branch. Any developer in the world should be able to work on the code we
actually have in production without having to decide between trunk and a
production branch.

- Allows the outer circle developer to take things into their own
hands. They can check out the code that is develop a changelist, or set
of changes, that is carefully crafted to be applied to trunk. If they
are screwing up, they should get instant feedback, not six months later.

- Does not unduly penalize the inner circle developer. Give them a
constant stream of light duties, not a soul-crushing marathon of heavy
duties once a year.

I admire the code review paradigm at Google, which does all that, but
which is regrettably based on tools that are not all available freely.
So I don't have a 100% solution for you yet. I've talked informally with
RobLa about this but I didn't have anything really solid to bring to the
community. (In early 2010 I started looking at ReviewBoard but then I
realized that the MediaWiki community had their own tool and I figured I
should understand that first.)

There's been some confusion, so perhaps I have not been clear that I'm
referring to a totally different paradigm of code review, where the code
to be reviewed isn't even in subversion. Essentially the developers
would pass around something like patches. Some systems make this work
over email, but it's easier if it's tracked in a web based system. As
developers work on the patch, the change log message or other metadata
about the patch is annotated with developer comments. Sometimes you
bring in more developers -- maybe there's some aspect that someone else
is better situated to understand.

This process goes on until the patch is deemed worthy. Then, and only
then, does it get committed to an authoritative repository, authored by
some developer and annotated as reviewed by one or more other
developers. (Emergency patches get a review this later tag).

Now, when code review happens before committing, you get the benefit
that all non-emergency checkins have been at least looked at by
somebody. Personally I believe this should be happening for everybody,
even experienced developers. Even they make mistakes sometime, and if
not, then other developers learn how to code like them by reading their
changes more intently.

But then your code review system has to reinvent several wheels and you
annoy the developer by making them do fresh checkouts all the time.

Git can do a lot of that in a much cleaner way, so I expect Gerrit might
be an even better solution.

Anyway, this is all vague and clearly I'm talking about radical changes
to the entire MediaWiki community. But I believe it would help quite a bit.

Maybe I should work on it a bit more and present it on a wiki page
somewhere, as well as in person in Berlin in May?



 
 Neilk is realist. Either we bring more developers in the system or we 
 drop it and reuse another system already having some developers. For 
 example, we are not developing our own bug tracker or webmail 
 interfaces. We reuse code from others just like other reuse our Wiki code.
 
 I would name a few issues with our CR system:
 - does not known about branches
 - lacks a manual merging system
 - lacks an automatic merging system (something like: if rev and follow 
 up got 4 sign up, merge them all 

Re: [Wikitech-l] Improving code review: Mentors/maintainers?

2011-03-24 Thread Jack Phoenix
On Wed, Mar 23, 2011 at 4:14 PM, Daniel Friesen
li...@nadir-seen-fire.comwrote:

 On 11-03-23 06:36 AM, Marcin Cieslak wrote:
  [...]
  Just to give a not-so-hypothetical example, since I don't like discussing
  in vain, what about this:
 
Is this okay to fix
 https://bugzilla.wikimedia.org/show_bug.cgi?id=16260
by adding a new [[Message:qsidebar]] that is the same as
 [[Message:Sidebar]]
only accepts  EDIT, THISPAGE, CONTEXT, MYPAGES, SPECIALPAGES, TOOLBOX
 boxes?
 
  I see that hartman and dartman did some work there recently, and ashley
  one clean up about a year ago.
 
  //Marcin
 I'd actually like to eliminate legacy skins altogether. They show up and
 throw a thorn into skin improvements repeatedly.


+1


Thanks and regards,
--
Jack Phoenix
MediaWiki developer
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] parse vs. recursiveTagParse

2011-03-24 Thread Stephan Gambke
Am 23.03.2011 23:33, Tim Starling wrote:
 recursiveTagParse() is the function to use from a tag hook or other
 parser hook, to parse text when a parse operation is already in
 progress on the same Parser object. It should not be used when a parse
 operation is not in progress. Its output is actually half-parsed, with
 placeholders for tag hooks and links.
 
 parse() is the function to use when a parse operation is not in
 progress, such as in a special page. It should not be used from a hook
 into a parse operation, unless a separate Parser object is
 constructed. This is because it destroys the state of the Parser
 object on which it is called.
 
 Includable special pages have an execute() function which can be
 called from either context, so to parse text within them, it's
 necessary to check $this-mIncluding to determine the correct function
 to use. I don't recommend using includable special pages in new
 extensions.
 
 Hope that helps.

It does, thanks! And thanks to Platonides, too.

Cheers,
Stephan

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] code review criticism (Re: Converting to Git?)

2011-03-24 Thread Sumana Harihareswara
On 03/24/2011 10:13 AM, Neil Kandalgaonkar wrote:
 Anyway, this is all vague and clearly I'm talking about radical changes
 to the entire MediaWiki community. But I believe it would help quite a bit.

 Maybe I should work on it a bit more and present it on a wiki page
 somewhere, as well as in person in Berlin in May?

I've added your idea to the list of possible topics to talk about/work 
on in Berlin in May:

http://www.mediawiki.org/wiki/Berlin_Hackathon_2011#Topics

Yeah, maybe in Berlin you could briefly summarize your proposal and we 
could hash out some next steps?

best,
Sumana

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Enable WikiTrust spanish support

2011-03-24 Thread Luca de Alfaro
Hi All,

yes, I think we could bring up support for WikiTrust on the Spanish
Wikipedia for this purpose.
The way we worked with Martin Walker for the English project is that he gave
us a list of page_ids, and we gave back a csv file with, for each page_ids,
the recommended revision_ids, each with a quality indication, and other
information (timestamps, other useful metadata...) and I think Martin
basically just followed the recommendation.

How far away are you from having a list of page_ids?
If we could support this on our existing server, it should not be too much
work for us to set it up.
Let us know.

I apologize for the delay in answering!

Luca

On Thu, Mar 24, 2011 at 5:29 AM, Wilfredor wilfre...@gmail.com wrote:

 Yours sincerely,

 Has long tried to start a Wikipedia 1.0 project in Spanish
 (http://es.wikipedia.org/wiki/Wikipedia:Wikipedia_en_CD). Project
 similar to the English version.

 The problem is that I have been unable to contact WikiTrust team
 (http://www.wikitrust.net/authors). We need the support of the Spanish
 system, which does not exist yet.

 I apologize in advance if this is not the right place.

 Thank you very much.

 --
 User:Wilfredor

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] code review criticism (Re: Converting to Git?)

2011-03-24 Thread Roan Kattouw
2011/3/24 Neil Kandalgaonkar ne...@wikimedia.org:
 - Allows us to deploy trunk. At any time. Eliminate the production
 branch. Any developer in the world should be able to work on the code we
 actually have in production without having to decide between trunk and a
 production branch.

You're basically arguing for Linux-style pre-commit code review where
people e-mail patches back and forth. However, as long as we still
have SVN, that means that these pre-committed patches ARE NOT
VERSIONED, let alone necessarily public.

I believe this is bad because:
1) Keeping track of patches, collaborating on a larger feature, etc.
become harder (no benefits of a VCS)
2) Resolving conflicts between patches is done by reviewers when they
apply them instead of being conveniently outsourced to the
author-committers
3) If review capacity is low, patches don't get committed, their
authors bug reviewers a few times, give up, get demotivated and leave
the project

I think this workflow could work with a DVCS with Git, but I strongly
oppose implementing it while we're still using a centralized VCS like
Subversion.

Instead, let me outline my recollection of how code review and
deployment worked back when I joined this project, and explain how I
think this process can be resurrected. This was all a long time ago
and I was fairly new to MW, so please correct me where I'm wrong.

* Someone commits something
* A notification is sent to the mediawiki-cvs list. This is still the
case, except back then more than a few people were subscribed to it,
and traffic wasn't as high
* Optionally, a mediawiki-cvs reader (the usual suspects being Brion,
Tim and Rob Church) reads the diff and notices something is wrong with
it. They reply to the commit notification, citing parts of the diff
inline and raising their objections. This reply is automatically sent
to wikitech-l (we didn't have the CodeReview extension yet), which
committers are expected to be subscribed to. A discussion about the
commit takes place, possibly leading to followup commits
* The next Monday, Brion smoketests HEAD. If he finds breakage, he
tracks down the offending revision(s) and reverts things until
everything seems to work. (Keep trunk runnable was taken really
seriously, and we mostly had a revert-reapply cycle instead of a
fixme-followup cycle: it was perfectly acceptable to revert broken
things if they couldn't be fixed in 5 minutes, especially if you were
as busy as Brion.)
* In addition to smoketesting, Brion also reviews all revisions to
phase3 and WMF-run extensions (with the level of commit activity we
had back then, this wasn't an unreasonable job for one person to do on
one day) and reverts things as appropriate.
* trunk is now in a state where it seems to run fine on Brion's
laptop. Brion deploys trunk to testwiki, tests a bit more, then
deploys to the cluster

As you know, our workflow has become a bit different over the years.
At some point, CodeReview was written to make revision discussions
nicer and to provide status fields so Brion could outsource some
review work. Later, the WMF branch was introduced to not only track
live hacks and WMF-specific changes, but also to remove the dependency
on a runnable trunk.

The reason this workflow resulted in frequent deployments of trunk was
that review was that review was always close to HEAD (never behind
more than about 2 weeks). The reason it broke down in the end was that
Brion kept having less time to review more things, but that doesn't
mean we can't make it work again by having more than one reviewer. I
think the following conditions are necessary for this to happen:
* We need to have multiple reviewers (duh)
* Every reviewer needs to budget time for code review, and they need
to not get tangled up in other obligations to a degree where they
can't spend enough time on review. This is largely a management thing
* It needs to be clear who is responsible for reviewing what. This
doesn't need to be set in stone, but we have to avoid a situation
where revisions aren't reviewed because no one feels responsible. This
can be accomplished by agreeing on review assignments based on e.g.
path/subsystem, and having some manager-like person (possibly an EPM)
monitor the process and make sure nothing gets left by the wayside. If
conventional review assignments leave too much ambiguity, additional
criteria can be introduced, e.g. the day of the week something was
committed. More on this in a minute
* There needs to be a clear expectation that commits are generally
reviewed within a certain time (say, a week) after having been
committed. The same manager-like person should also be keeping an eye
on this and making sure overdue revs are reviewed pronto
* We need to set a clear policy for reverting problematic revisions
(fixme's) if they aren't addressed quickly enough (again, let's say
within a week). Currently we largely leave them be, but I think we
should go back to something more decisive and closer to the keep
trunk 

Re: [Wikitech-l] code review criticism (Re: Converting to Git?)

2011-03-24 Thread Ashar Voultoiz
On 24/03/11 09:41, K. Peachey wrote:
snip
 It's sitting there in SVN, nothing is stopping people from working on
 it, In fact Sam and Chad might like the help, But your arugment that
 having more developers(/man power) != better working systems.

I am a dev with commit access and could probably sync the patches on the 
live site (the day I figure out if I have commit access to the 
production branch).  My personal issue is that I am lacking the time to 
think about the problem, design a solution, implements it and tests it. 
Since I have workarounds, I focus on small tasks or things that really 
matter to me and to my wife (she is my first tester / user).

Anyway, I was answering to MZMcBride in the context of things I do not 
like in our code review software. Those issues highlight the reviewing 
paradigm behind the tool and Neil Kandalgaonkar explained this way 
better than I would ever be able to do.

(still I like our code review software since it feats our actual needs)

snip issues

-- 
Ashar Voultoiz


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Moving the Dump Process to another language

2011-03-24 Thread Yuvi Panda
Hi, I'm Yuvi, a student looking forward to working with MediaWiki via
this year's GSoC.

I want to work on something dump related, and have been bugging
apergos (Ariel) for a while now. One of the things that popped up into
my head is moving the dump process to another language (say, C#, or
Java, or be very macho and do C++ or C). This would give the dump
process quite a bit of a speed bump (The profiling I did[1] seems to
indicate that the DB is not the bottleneck. Might be wrong though),
and can also be done in a way that makes running distributed dumps
easier/more elegant.

So, thoughts on this? Is 'Move Dumping Process to another language' a
good idea at all?

P.S. I'm just looking out for ideas, so if you have specific
improvements to the dumping process in mind, please respond with those
too. I already have DistributedBZip2 and Incremental Dumps in mind too
:)

[1]: https://bugzilla.wikimedia.org/show_bug.cgi?id=5303

Thanks :)

-- 
Yuvi Panda T
http://yuvi.in/

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Where exactly are the video and audio players at?

2011-03-24 Thread Michael Dale
On 03/24/2011 04:45 AM, Joseph Roberts wrote:
 Actually, looking through OggHandler, I do think that developing a
 seperate entity may work well.
 I'm not quite sure what is wanted by the general public and would like
 to do what is wanted by the majority, not just wat would be easiest or
 even the best.
 What would be the best way to implement a HTML5 player in MediaWiki?

 TIA - Joseph Roberts


There is the Extension:TimedMediaHandler, that implements multi-format
multi-bitrate transocding with auto source selection, html5 player
interface, timed text, temporal media fragments, gallery and search
pop-up players, viral iframe sharing / embedding, etc.

Demo page here:
http://prototype.wikimedia.org/timedmedia/Main_Page

peace,
michael

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Deleting bad ideas off Google Summer of Code suggestion page

2011-03-24 Thread Sumana Harihareswara
There's no point in having our GSoC applicants wasting time working on 
proposals that we aren't really interested in, that another developer is 
already half-done implementing, or that are just plain bad ideas.  So 
please take a look at the project ideas page

http://www.mediawiki.org/wiki/Summer_of_Code_2011#Project_ideas

and help me get rid of anything that wouldn't be a good student project.

If you could take a moment to skim the page and do this tomorrow or over 
the weekend, that would be great -- I've already had to redirect a few 
students once #mediawiki folks let me know that their first-choice 
project ideas were bad.

Thanks,
Sumana

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Moving the Dump Process to another language

2011-03-24 Thread Brion Vibber
On Thu, Mar 24, 2011 at 1:05 PM, Yuvi Panda yuvipa...@gmail.com wrote:

 Hi, I'm Yuvi, a student looking forward to working with MediaWiki via
 this year's GSoC.

 I want to work on something dump related, and have been bugging
 apergos (Ariel) for a while now. One of the things that popped up into
 my head is moving the dump process to another language (say, C#, or
 Java, or be very macho and do C++ or C). This would give the dump
 process quite a bit of a speed bump (The profiling I did[1] seems to
 indicate that the DB is not the bottleneck. Might be wrong though),
 and can also be done in a way that makes running distributed dumps
 easier/more elegant.

 So, thoughts on this? Is 'Move Dumping Process to another language' a
 good idea at all?


I'd worry a lot less about what languages are used than whether the process
itself is scalable.

The current dump process (which I created in 2004-2005 when we had a LOT
less data, and a LOT fewer computers) is very linear, which makes it awkward
to scale up:

* pull a list of all page revisions, in page/rev order
  * as they go through, pump page/rev data to a linear XML stream
* pull that linear XML stream back in again, as well as the last time's
completed linear XML stream
  * while going through those, combine the original page text from the last
XML dump, or from the current database, and spit out a linear XML stream
containing both page/rev data and rev text
  * and also stick compression on the end

About the only way we can scale it beyond a couple of CPUs
(compression/decompression as separate processes from the main PHP stream
handler) is to break it into smaller linear pieces and either reassemble
them, or require users to reassemble the pieces for linear processing.

Within each of those linear processes, any bottleneck will slow everything
down whether that's bzip2 or 7zip compression/decompression, fetching
revisions from the wiki's complex storage systems, the XML parsing, or
something in the middle.

What I'd recommend looking at is ways to actually rearrange the data so a)
there's less work that needs to be done to create a new dump and b) most of
that work can be done independently of other work that's going on, so it's
highly scalable.

Ideally, anything that hasn't changed since the last dump shouldn't need
*any* new data processing (right now it'll go through several stages of
slurping from a DB, decompression and recompression, XML parsing and
re-structuring, etc). A new dump should consist basically of running through
appending new data and removing deleted data, without touching the things
that haven't changed.

This may actually need a fancier structured data file format, or perhaps a
sensible directory structure and subfile structure -- ideally one that's
friendly to beed updated via simple things like rsync.

-- brion
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Deleting bad ideas off Google Summer of Code suggestion page

2011-03-24 Thread K. Peachey
On Fri, Mar 25, 2011 at 7:40 AM, Sumana Harihareswara suma...@panix.com wrote:
 There's no point in having our GSoC applicants wasting time working on
 proposals that we aren't really interested in
Who is we the wikimedia foundation? the medawiki developers? someone else?

If anyanything they should be striked though (delwhatever/del) so
people can still see that they were there and left a note as to why
they were bad

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Moving the Dump Process to another language

2011-03-24 Thread Platonides
Yuvi Panda wrote:
 Hi, I'm Yuvi, a student looking forward to working with MediaWiki via
 this year's GSoC.
 
 I want to work on something dump related, and have been bugging
 apergos (Ariel) for a while now. One of the things that popped up into
 my head is moving the dump process to another language (say, C#, or
 Java, or be very macho and do C++ or C). This would give the dump
 process quite a bit of a speed bump (The profiling I did[1] seems to
 indicate that the DB is not the bottleneck. Might be wrong though),
 and can also be done in a way that makes running distributed dumps
 easier/more elegant.
 
 So, thoughts on this? Is 'Move Dumping Process to another language' a
 good idea at all?
 
 P.S. I'm just looking out for ideas, so if you have specific
 improvements to the dumping process in mind, please respond with those
 too. I already have DistributedBZip2 and Incremental Dumps in mind too
 :)
 
 [1]: https://bugzilla.wikimedia.org/show_bug.cgi?id=5303
 
 Thanks :)
 

An idea I have been pondering is to pass the offset to the previous
revision to the compressor, so it would need much less work in the
compressing window to perform its work. You would need something like
7z/xz so that the window can be big enough to contain at least the
latest revision (its compression factor is quite impressive, too: 1TB
down to 2.31GB). Note that I haven't checked on how factible it can be
such modification to the compressor.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Moving the Dump Process to another language

2011-03-24 Thread James Linden
 So, thoughts on this? Is 'Move Dumping Process to another language' a
 good idea at all?


 I'd worry a lot less about what languages are used than whether the process
 itself is scalable.

I'm not a mediawiki / wikipedia developer, but as a developer / sys
admin, I'd think that adding another environment stack requirement (in
the case of C# or Java) to the overall architecture would be a bad
idea in general.

 The current dump process (which I created in 2004-2005 when we had a LOT
 less data, and a LOT fewer computers) is very linear, which makes it awkward
 to scale up:

 * pull a list of all page revisions, in page/rev order
  * as they go through, pump page/rev data to a linear XML stream
 * pull that linear XML stream back in again, as well as the last time's
 completed linear XML stream
  * while going through those, combine the original page text from the last
 XML dump, or from the current database, and spit out a linear XML stream
 containing both page/rev data and rev text
  * and also stick compression on the end

 About the only way we can scale it beyond a couple of CPUs
 (compression/decompression as separate processes from the main PHP stream
 handler) is to break it into smaller linear pieces and either reassemble
 them, or require users to reassemble the pieces for linear processing.

 Within each of those linear processes, any bottleneck will slow everything
 down whether that's bzip2 or 7zip compression/decompression, fetching
 revisions from the wiki's complex storage systems, the XML parsing, or
 something in the middle.

 What I'd recommend looking at is ways to actually rearrange the data so a)
 there's less work that needs to be done to create a new dump and b) most of
 that work can be done independently of other work that's going on, so it's
 highly scalable.

 Ideally, anything that hasn't changed since the last dump shouldn't need
 *any* new data processing (right now it'll go through several stages of
 slurping from a DB, decompression and recompression, XML parsing and
 re-structuring, etc). A new dump should consist basically of running through
 appending new data and removing deleted data, without touching the things
 that haven't changed.

 This may actually need a fancier structured data file format, or perhaps a
 sensible directory structure and subfile structure -- ideally one that's
 friendly to beed updated via simple things like rsync.

I'm probably stating the obvious here...

Breaking the dump up by article namespace might be a starting point --
have 1 controller process for each namespace. That leaves 85% of the
work in the default namespace, which could them be segmented by any
combination of factors, maybe as simple as block batches of X number
of articles.

When I'm importing the XML dump to MySQL, I have one process that
reads the XML file, and X processes (10 usually) working in parallel
to parse each article block on a first-available queue system. My
current implementation is a bit cumbersome, but maybe the idea could
be used for building the dump as well?

In general, I'm interested in pitching in some effort on anything
related to the dump/import processes.

--
James Linden
kodekr...@gmail.com
--

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Converting to Git?

2011-03-24 Thread Aryeh Gregor
On Tue, Mar 22, 2011 at 10:46 PM, Tim Starling tstarl...@wikimedia.org wrote:
 The tone is quite different to one of the first things I read about
 Mercurial:

 Oops! Mercurial cut off your arm!

 Don't randomly try stuff to see if it'll magically fix it. Remember
 what you stand to lose, and set down the chainsaw while you still have
 one good arm.

 https://developer.mozilla.org/en/Mercurial_basics

My experience with Mercurial is that if you type the wrong commands,
it likes to destroy data.  For instance, when doing an hg up with
conflicts once, it opened up some kind of three-way diff in vim that I
had no idea how to use, and so I exited.  This resulted in my working
copy (or parts of it) being lost, since apparently it defaulted to
assuming that I was okay with whatever default merging it had done, so
it threw out the rest.  I also once lost commits under similar
circumstances when doing hg rebase.  I'm pretty sure you can configure
it to be safer, but it's one of the major reasons I dislike Mercurial.
 (I was able to recover my lost data from filesystem backups.)

git, on the other hand, never destroys committed data.  Barring bugs
(which I don't recall ever running into), the only command that
destroys data is git gc, and that normally only destroys things that
have been disconnected for a number of days.  If you do a rebase, for
instance, the old commits are no longer accessible from normal
commands like git log, but they'll stick around for some period of
time, so you can recover them if needed (although the process is a bit
arcane if you don't know the commit id's).  There are also no git
commands I've run into that will do anything nasty to your working
copy without asking you, except obvious ones like git reset --hard.
In the event of update conflicts, for instance, git adds conflict
markers just like Subversion.

 The main argument is that merging is easy so you can branch without
 the slightest worry. I think this is an exaggeration. Interfaces
 change, and when they change, developers change all the references to
 those interfaces in the code which they can see in their working copy.
 The greater the time difference in the branch points, the more likely
 it is that your new code will stop working. As the branch point gap
 grows, merging becomes more a task of understanding the interface
 changes and rewriting the code, than just repeating the edits and
 copying in the new code.

 I'm not talking about the interfaces between core and extensions,
 which are reasonably stable. I'm mainly talking mainly about the
 interfaces which operate within and between core modules. These change
 all the time. The problem of changing interfaces is most severe when
 developers are working on different features within the same region of
 core code.

 Doing regular reintegration merges from trunk to development branches
 doesn't help, it just means that you get the interface changes one at
 a time, instead of in batches.

 Having a short path to trunk means that the maximum amount of code is
 visible to the developers who are doing the interface changes, so it
 avoids the duplication of effort that occurs when branch maintainers
 have to understand and account for every interface change that comes
 through.

In practice, this is generally not true.  Realistically, most patches
change a relatively small amount of code and don't cause merge
conflicts even if you keep them out of trunk for quite a long time.
For instance, I maintain dozens of patches to the proprietary forum
software vBulletin for the website I run.  I store them all in git,
and to upgrade I do a git rebase.  Even on a major version upgrade, I
only have to update a few of the patches, and the updates are small
and can be done mindlessly.  It's really very little effort.  Even a
commit that touches a huge amount of code (like my conversion of named
entity references to numeric) will only conflict with a small
percentage of patches.

Of course, you have to be more careful with changing interfaces around
when people use branches a lot.  But in practice, you spend very
little of your time resolving merge conflicts, relative to doing
actual development work.  It's not a significant disadvantage in
practice.  Experienced Subversion users just expect it to be, since
merging in Subversion is horrible and they assume that's how it has to
be.  (Disclaimer: merges in Subversion are evidently so horrible that
I never actually learned how to do them, so I can't give a good
breakdown of why exactly DVCS merging is so much better.  I can just
say that I've never found it to be a problem at all while using a
DVCS, but everyone complains about it with Subversion.)

I mean, the DVCS model was popularized by the Linux kernel.  It's hard
to think of individual codebases that large, or with that much
developer activity.  In recent years it's over 9,000 commits per
release changing several hundred thousand lines of code, which works
out to several thousand LOC 

Re: [Wikitech-l] code review criticism (Re: Converting to Git?)

2011-03-24 Thread Aryeh Gregor
On Thu, Mar 24, 2011 at 2:00 PM, Roan Kattouw roan.katt...@gmail.com wrote:
 2) Resolving conflicts between patches is done by reviewers when they
 apply them instead of being conveniently outsourced to the
 author-committers

If there's a conflict, the reviewer can ask the patch submitter to
submit a new version with the conflict resolved.  I had this happen to
me for one of the patches I submitted to Mozilla.  I was then asked to
submit an interdiff to highlight what had changed in my new version,
so that the reviewer didn't have to re-review the parts of the patch
that didn't change.  Review-then-commit systems tend to place much
more of a burden on the submitter and less on the reviewer.

 3) If review capacity is low, patches don't get committed, their
 authors bug reviewers a few times, give up, get demotivated and leave
 the project

This is the major issue.  We need to get review sorted out on a
organizational basis before we start considering shaking anything up.
At Mozilla, the way it works (in my experience) is you ask a suitable
person for review, and they reliably respond to you within a few days.
 I'm sure that for large patchsets it's harder than for the trivial
patches I submit, but the system clearly works.  We need to have a
pool of reviewers who are responsible for setting aside their other
responsibilities to whatever extent is necessary to get new code
adequately reviewed (which could just mean reverting it if it has too
many problems).

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] code review criticism (Re: Converting to Git?)

2011-03-24 Thread Happy-melon
I think Roan hits it on the nose.  Most of the problems Ashar and Neil raise 
are flaws in our code review process, not flaws in the tools we use *to do* 
code review.  I actually think that CodeReview works quite well, **for the 
system we currently use**.  I think many of us agree that, one way or 
another, *that system* has major flaws.

The fact that one discussion has quickly fragmented into fresh threads on 
*all* of the 'big three' (code review workflow, VCS, and release cycle) 
illustrates how intimately connected all these things are.  It makes no 
sense to choose a VCS which doesn't support our code review workflow; if our 
code review is worthless if it does not support a coherent release cycle; 
and the release workflow (and the to-freeze-or-not-to-freeze question) has a 
dependency on the VCS infrastructure.

Ultimately, though, it's a mistake to think of any of these issues as 
technical questions: they are **social** problems.  We have to choose the 
*mindset* which works for us as individuals, as a group and as a charitable 
Foundation.  Currently our development mindset is of the Wild West: pretty 
much everyone works alone, on things which either interest them or which 
they are being paid to be interested in, and while everyone is responsible 
enough to fix their own bugs, our focus is on whatever we, individually, are 
doing rather than the finished product, because the product only *becomes* 
finished once every 6 months or so.  The only reasons now that we keep trunk 
broadly runnable are a) it makes it easier for us to continue our own 
development, and b) the TWN people shout at us whenever we break it.

I'm not, let me be clear, saying that said 'Wild West' mindset is at all a 
bad thing, it is very open and inclusive and it keeps us from the endless 
trivial discussions which lead to cynicism and then flames in more 
close-knit communities.  But as Roan says, it is *not* the only mindset, and 
the alternative is one which is more focussed at every stage on how changes 
affect a continuously-finished product.  We know the regime which is at the 
other end of the scale: the Linux kernel's universal pre-commit review, 
which I'm going to suggest we call the 'Burnt Offering' approach to coding 
as patches are worked, reworked, and inevitably reduced in number before 
being presented for divine approval.  That has clear advantages, in ensuring 
very high code quality and probably improving *everyone's* coding skills, 
but also the disadvantages Roan mentions.

The smoketest-trunk-every-week development model, which defies being given a 
crass analogy, is somewhere in the middle, and I think that's closer to 
where we need to be.  If we made an absolute policy of scapping to the WMF 
cluster once a week, every week, it would force a shift in our mindset 
(arguably a shift *back*), but not one that's seen as an artificial 
limitation.  No one will begrudge a release manager reverting changes on 
Tuesday afternoon which people agree will not be fixed in time for a 
Wednesday scap, while the same release manager spending Tuesday *not* 
merging changes for the exact same reason is seen in a much more negative 
light.  We retain people's ability to make rapid and immediate changes to a 
bleeding-edge trunk, but still ensure that we do not get carried away, as we 
did for 1.17 and are still merrily doing for 1.18, on a tide of editing 
which is not particularly focussed or managed (witness the fact that out of 
the 15,000 revisions in 1.17, we can point out only about three 'headline' 
features).

There are implementation questions to follow on from whichever workflow 
regime we move towards: for the weekly-scap process we need to find a 
replacement for Brion and his cluebat which is as reliable and efficient as 
he was; for a Linux-style system we need to sort out how to ensure that 
patches get the review that they need and that it doesn't just kill our 
development stone dead; and even to continue in the Wild West we need to 
sort out how to stop traceing out the Himlayas with the graph of unreviewed 
commits and actually get our damn releases out to prove that the system can 
work.  My main point is that *any* technical discussion, about SVN/Git, 
about CodeReview or its alternatives, even about Bugzilla/Redmine, is 
premature unless we have reached an adequate conclusion about the social 
aspects of this combined issue.  Because Git does not write code, nor does 
CodeReview or Bugzilla.  *We* write MediaWiki, and we could in principle do 
it in notepad or pico if we wanted (some of us probably do :-D).  The most 
important question is what will make us, as a group, more effective at 
writing cool software.  Answers on a postcard.

--HM
 



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Where exactly are the video and audio players at?

2011-03-24 Thread Aryeh Gregor
On Thu, Mar 24, 2011 at 9:22 AM, Joseph Roberts
roberts.jos...@ntlworld.com wrote:
 Ah, cool.  If no one minds, shouldn't [[mw:HTML5]] be editted to
 reflect what h in the mainstream?

[[mw:HTML5]] only really covers the use of HTML5 markup, not other
HTML5 features.  The idea was to discuss the benefits of changing our
doctype from XHTML 1.0 Transitional to HTML5, from the standpoint of
what features we want that would break validation in XHTML 1.0
Transitional.  Cortado doesn't use HTML5 markup, it's JavaScript that
inserts video elements into the DOM at runtime.  The fact that the
feature it uses happens to be in the HTML5 standard doesn't make much
of a difference to anything, any more than does the fact that
getElementsByClassName() happens to be in HTML5.

[[mw:HTML5]] discusses the possibility of using video or audio
tags directly in the markup.  This would allow our video or audio to
play even in browsers that have JavaScript disabled, which would be
nice to have.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] code review criticism (Re: Converting to Git?)

2011-03-24 Thread Aryeh Gregor
On Thu, Mar 24, 2011 at 9:27 PM, Happy-melon happy-me...@live.com wrote:
 I think Roan hits it on the nose.  Most of the problems Ashar and Neil raise
 are flaws in our code review process, not flaws in the tools we use *to do*
 code review.  I actually think that CodeReview works quite well, **for the
 system we currently use**.

I think it works very poorly, independent of our code review process:

* The discussion system is buggy and unpredictably mangles markup (it
probably shouldn't have attempted to support wikitext in the first
place . . .)
* It doesn't allow line-by-line review of patches
* It doesn't allow sensible configuration of e-mail notification
* It doesn't integrate with version control beyond just reading it
(e.g., it could support merging to a branch from within the web UI)
* It doesn't integrate with bug tracking at all

These are only the things I can think of off the top of my head
without having even *used* decent code review software.  I'm pretty
sure that if I had used something like Gerrit a lot, I'd recognize a
lot more drawbacks.  The bottom line is, trying to write our own code
review software is just a bad idea in the long run.

 Ultimately, though, it's a mistake to think of any of these issues as
 technical questions: they are **social** problems.  We have to choose the
 *mindset* which works for us as individuals, as a group and as a charitable
 Foundation.

There are technical problems here as well.  The technical advantages
of moving to a DVCS can be separated completely from the social
advantages of the new code review paradigms we could adopt after doing
so.  Moving from SVN to git would be a step forward in a lot of ways
even if we kept the code review and deployment process basically
unchanged.  But it's also a prerequisite for adopting certain types of
code review, or at least it would make adopting those types of code
review much easier, so we should really talk about switching to git
before we consider a review-then-commit system.

 We know the regime which is at the
 other end of the scale: the Linux kernel's universal pre-commit review,
 which I'm going to suggest we call the 'Burnt Offering' approach to coding
 as patches are worked, reworked, and inevitably reduced in number before
 being presented for divine approval.  That has clear advantages, in ensuring
 very high code quality and probably improving *everyone's* coding skills,
 but also the disadvantages Roan mentions.

The real distinguishing feature of Linux development isn't pre-commit
review, it's that it's pull-only and thus completely
individual-oriented.  Linus Torvalds personally decides what gets into
the official Linux kernel, and no one else has any actual say (beyond
trying to persuade him).  He mostly delegates to maintainers, who in
turn mostly run their parts of the kernel as fiefdoms as well.  This
approach is idiosyncratic, and closely related to the fact that Linux
is produced by dozens of independent organizations with no central
organizational oversight.

I don't think we should be seriously contemplating the Linux model for
MediaWiki.  The overwhelming majority of MediaWiki development is done
by either Wikimedia employees, or volunteers who are closely connected
to Wikimedia employees.  MediaWiki isn't an individual's project, it's
Wikimedia's project, so a totally decentralized version control
process wouldn't match the reality of how development works.

I continue to suggest that we look at a process more like Mozilla's.
Like MediaWiki, Mozilla's projects are developed under the central
control of a not-for-profit organization (modulo wholly-owned
for-profit subsidiaries that exist for tax reasons) committed to
openness and community participation.  It's much more accessible to
new contributors than either MediaWiki or Linux development, and I can
speak to that personally as someone who's submitted code to both
MediaWiki and Mozilla.  (Not to Linux, but the reputation of Linux
development is consistently scary enough that I don't think I need
personal experience . . .)

 The smoketest-trunk-every-week development model, which defies being given a
 crass analogy, is somewhere in the middle, and I think that's closer to
 where we need to be.  If we made an absolute policy of scapping to the WMF
 cluster once a week, every week, it would force a shift in our mindset
 (arguably a shift *back*), but not one that's seen as an artificial
 limitation.  No one will begrudge a release manager reverting changes on
 Tuesday afternoon which people agree will not be fixed in time for a
 Wednesday scap, while the same release manager spending Tuesday *not*
 merging changes for the exact same reason is seen in a much more negative
 light.  We retain people's ability to make rapid and immediate changes to a
 bleeding-edge trunk, but still ensure that we do not get carried away, as we
 did for 1.17 and are still merrily doing for 1.18, on a tide of editing
 which is not particularly focussed or 

Re: [Wikitech-l] Converting to Git?

2011-03-24 Thread Daniel Friesen
On 11-03-24 06:12 PM, Aryeh Gregor wrote:
 On Tue, Mar 22, 2011 at 10:46 PM, Tim Starlingtstarl...@wikimedia.org  
 wrote:
 If we split up the extensions directory, each extension having its own
 repository, then this will discourage developers from updating the
 extensions in bulk. This affects both interface changes and general
 code maintenance. I'm sure translatewiki.net can set up a script to do
 the necessary 400 commits per day, but I'm not sure if every developer
 who wants to fix unused variables or change a core/extension interface
 will want to do the same.
 I've thought about this a bit.  We want bulk code changes to
 extensions to be easy, but it would also be nice if it were easier to
 host extensions officially to get translations, distribution, and
 help from established developers.  We also don't want anyone to have
 to check out all extensions just to get at trunk.  Localization, on
 the other hand, is entirely separate from development, and has very
 different needs -- it doesn't need code review, and someone looking at
 the revision history for the whole repository doesn't want to see
 localization updates.  (Especially in extensions, where often you have
 to scroll through pages of l10n updates to get to the code changes.)

 Unfortunately, git's submodule feature is pretty crippled.  It
 basically works like SVN externals, as I understand it: the larger
 repository just has markers saying where the submodules are, but their
 actual history is entirely separate.  We could probably write a script
 to commit changes to all extensions at once, but it's certainly a less
 ideal solution.
git's submodule feature is something like svn-externals but has a big 
fundamental difference.
svn externals tracks only a repo. so you update you get the latest 
version of that repo.
git submodules tracks a repo and a commit id, always. So when you update 
you always get the same commit id. Changing that commit id requires 
making a commit to the git repo to update it. You can also checkout an 
old commit and submodule update will checkout the commmit id of the 
submodule that was committed at that point in time.
But yes, for both of them it's merely references, they do not store the 
actual history. They're glorified helper scripts essentially, they don't 
alleviate the task of downloading each repo separately. They just make 
the vcs do it for you, instead of you running a script in some other 
language to do it for you.

In my honest opinion, submodules was not designed for what we are trying 
to shove into it. And given that one of it's key features (tracking a 
specific commit id to ensure the same version is always checked out) is 
actually the opposite of what we want, I believe the actual 
functionality of git submodules in this situation is no better than what 
we could build ourself with a few simple custom scripts. In fact I 
believe we could build something better for our purposes without too 
much effort. And we could check it into a git repo in place of the repo 
that submodules would be put in. If you dig through the git discussions 
I believe I listed a number of features we could add that would make it 
even more useful. Instead of a second repo, we could just put the tool 
itself inside mw's repo so that by checking out phase3 you get the tools 
needed to work with extensions.

 If we moved to git, I'd tentatively say something like

 * Separate out the version control of localization entirely.
 Translations are already coordinated centrally on translatewiki.net,
 where the wiki itself maintains all the actual history and
 permissions, so the SVN checkin right now is really a needless
 formality that keeps translations less up-to-date and spams revision
 logs.  Keep the English messages with the code in git, and have the
 other messages available for checkout in a different format via our
 own script.  This checkout should always grab the latest
 translatewiki.net messages, without the need for periodic commits.  (I
 assume translatewiki.net already does automatic syntax checks and so
 on.)  Of course, the tarballs would package all languages.
+1
 * Keep the core code in one repository, each extension in a separate
 repository, and have an additional repository with all of them as
 submodules.  Or maybe have extensions all be submodules of core (you
 can check out only a subset of submodules if you want).
 * Developers who want to make mass changes to extensions are probably
 already doing them by script (at least I always do), so something like
 for EXTENSION in extensions/*; do cd $EXTENSION; git commit -a -m
 'Boilerplate message'; cd ..; done shouldn't be an exceptional
 burden.  If it comes up often enough, we can write a script to help
 out.

 * We should take the opportunity to liberalize our policies for
 extension hosting.  Anyone should be able to add an extension, and get
 commit access only to that extension.  MediaWiki developers would get
 commit access to all hosted 

[Wikitech-l] Ehcache on Wikimedia

2011-03-24 Thread Tim Starling
Our parser cache hit ratio is very low, around 30%.

http://tstarling.com/stuff/hit-rate-2011-03-25.png

This seems to be mostly due to insufficient parser cache size. My
theory is that if we increased the parser cache size by a factor of
10-100, then most of the yellow area on that graph should go away.
This would reduce our apache CPU usage substantially.

The parser cache does not have particularly stringent latency
requirements, since most requests only do a single parser cache fetch.

So I researched the available options for disk-backed object caches.
Ehcache stood out, since it has a suitable feature set out of box and
was easy to use from PHP. I whipped up a MediaWiki client for it and
committed it in r83208.

My plan is to do a test deployment of it, starting on Monday my time
(i.e. Sunday night US time), and continuing until the cache fills up
somewhat, say 2 weeks. This deployment should have no user-visible
consequences, except perhaps for an improvement in speed.

-- Tim Starling


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Ehcache on Wikimedia

2011-03-24 Thread Daniel Friesen
On 11-03-24 07:43 PM, Tim Starling wrote:
 Our parser cache hit ratio is very low, around 30%.

 http://tstarling.com/stuff/hit-rate-2011-03-25.png

 This seems to be mostly due to insufficient parser cache size. My
 theory is that if we increased the parser cache size by a factor of
 10-100, then most of the yellow area on that graph should go away.
 This would reduce our apache CPU usage substantially.

 The parser cache does not have particularly stringent latency
 requirements, since most requests only do a single parser cache fetch.

 So I researched the available options for disk-backed object caches.
 Ehcache stood out, since it has a suitable feature set out of box and
 was easy to use from PHP. I whipped up a MediaWiki client for it and
 committed it in r83208.

 My plan is to do a test deployment of it, starting on Monday my time
 (i.e. Sunday night US time), and continuing until the cache fills up
 somewhat, say 2 weeks. This deployment should have no user-visible
 consequences, except perhaps for an improvement in speed.

 -- Tim Starling
Interesting.
I've been self-debating mem vs. disk caches myself for awhile.
I work with cloud servers a lot and while I may one day get something to 
a point where scaling caches and whatnot out will be important, I 
probably at that point won't be up to a 'collocate the servers' scale. 
So I've been thinking about things in the cloud limitations.
On the cloud RAM is relatively expensive, there's a limit to the server 
size you can get, and high ram usually means really expensive cloud 
machines that border on Hey, this is insane, I might as well go 
dedicated. but disk is readily available. And while low-latency is 
nice, I don't believe it's what we're aiming for when we're caching. 
Most of the stuff we cache in MW is not cached because we want it in a 
really high access low-latency way, but because the mysql queries that 
build them and things like parsing are so slow and expensive that we 
want to cache them temporarily. And in that situation it doesn't really 
matter if it's disk or memory cached, and larger caches can be useful.

For awhile I was thinking 'What if I give memcached on a machine of it's 
own a really large size and let it swap?'.
But if we're looking at support for disk caches, beautiful. Especially 
if they have hybrid models where they keep highly accessed parts of the 
cache in mem and expand to the disk.


What others did you look at?
 From a quick look I see redis, Ehcache, JCS, and OSCache.

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Ehcache on Wikimedia

2011-03-24 Thread Tim Starling
On 25/03/11 14:41, Daniel Friesen wrote:
 For awhile I was thinking 'What if I give memcached on a machine of it's 
 own a really large size and let it swap?'.

One problem you would likely run into is that the metadata is not
localised at all, so you would end up loading a lot of pages to do a
simple thing like serving a cache miss.

Another is that apps that aren't designed to be swapped out tend to do
silly things like iterate through linked lists that snake their way
all over the whole address space.

 What others did you look at?
  From a quick look I see redis, Ehcache, JCS, and OSCache.

Redis is in-memory only. Membase, MemcacheDB, MySQL, Riak and HBase
lacked basic caching features, like a limit on storage space and an
eviction feature which removes items when the storage limit is exceeded.

I didn't look at JCS. It seems suspiciously similar to Ehcache,
sharing its major pros and cons. The disk size limit is specified as
an object count instead of in bytes, and you only get persistence when
the cache is properly shut down. We really want a large proportion of
the objects to be preserved even if the power goes off.

I didn't look at OSCache. It seems to be aimed at small local
installations. It lacks a network-accessible get/set interface. The
disk cache size can't be configured properly:

cache.unlimited.disk

Indicates whether the disk cache should be treated as unlimited or
not. The default value is false. In this case, the disk cache capacity
will be equal to the memory cache capacity set by cache.capacity.

-- Tim Starling


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l