Re: [Wikitech-l] [Multimedia] Audio/video updates: TimedMediaHandler, ogv.js, and mobile
Very impressive, amazing progress for a part time project ! Thats interesting that iOS supports M-JPEG, had not heard that before. Per M-JPGEG in wikipedia app ... They have WKWebView webview in iOS8 and above no? so in theory could run the JS engine against the same subset of iOS devices at similar performance as a stop gap there. But of course the native player would be ideal ;) —michael On Jun 11, 2015, at 7:05 PM, Fabrice Florin fflo...@wikimedia.org wrote: Nicely done, Brion! We’re very grateful for all the mobile multimedia work you’ve been doing on your ‘spare time’ … Much appreciated, Fabrice On Jun 11, 2015, at 7:34 AM, Brion Vibber bvib...@wikimedia.org mailto:bvib...@wikimedia.org wrote: I've been passing the last few days feverishly working on audio/video stuff, cause it's been driving me nuts that it's not quite in working shape. TL;DR: Major fixes in the works for Android, Safari (iOS and Mac), and IE/Edge (Windows). Need testers and patch reviewers. == ogv.js for Safari/IE/Edge == In recent versions of Safari, Internet Explorer, and Microsoft's upcoming Edge browser, there's still no default Ogg or WebM support but JavaScript has gotten fast enough to run an Ogg Theora/Vorbis decoder with CPU to spare for drawing and outputting sound in real time. The ogv.js decoder/player has been one of my fun projects for some time, and I think I'm finally happy with my TimedMediaHandler/MwEmbedPlayer integration patch https://gerrit.wikimedia.org/r/#/c/165478/ for the desktop MediaWiki interface. I'll want to update it to work with Video.js later, but I'd love to get this version reviewed and deployed in the meantime. Please head over to https://ogvjs-testing.wmflabs.org/ https://ogvjs-testing.wmflabs.org/ in Safari 6.1+ or IE 10+ (or 'Project Spartan' on Windows 10 preview) and try it out! Particularly interested in cases where it doesn't work or messes up. == Non-JavaScript fallback for iOS == I've found that Safari on iOS supports QuickTime movies with Motion-JPEG video and mu-law PCM audio https://gerrit.wikimedia.org/r/#/c/217295/. JPEG and PCM are, as it happens, old and not so much patented. \o/ As such this should work as a fallback for basic audio and video on older iPhones and iPads that can't run ogv.js well, or in web views in apps that use Apple's older web embedding APIs where JavaScript is slow (for example, Chrome for iOS). However these get really bad compression ratios, so to keep bandwidth down similar to the 360p Ogg and WebM versions I had to reduce quality and resolution significantly. Hold an iPhone at arm's length and it's maybe ok, but zoom full-screen on your iPad and you'll hate the giant blurry pixels! This should also provide a working basic audio/video experience in our Wikipedia iOS app, until such time as we integrate Ogg or WebM decoding natively into the app. Note that it seems tricky to bulk-run new transcodes on old files with TimedMediaHandler. I assume there's a convenient way to do it that I just haven't found in the extension maint scripts... == In progress: mobile video fixes == Audio has worked on Android for a while -- the .ogg files show up in native audio elements and Just Work. But video has been often broken, with TimedMediaHandler's popup transforms reducing most video embeds into a thumbnail and a link to the original file -- which might play if WebM (not if Ogg Theora) but it might also be a 1080p original which you don't want to pull down on 3G! And neither audio nor video has worked on iOS. This patch https://gerrit.wikimedia.org/r/#/c/217485/ adds a simple mobile target for TMH, which fixes the popup transforms to look better and actually work by loading up an embedded-size player with the appropriately playable transcodes (WebM, Ogg, and the MJPEG last-ditch fallback). ogv.js is used if available and necessary, for instance in iOS Safari when the CPU is fast enough. (Known to work only on 64-bit models.) == Future: codec.js and WebM and OGVKit == For the future, I'm also working on extending ogv.js to support WebM https://brionv.com/log/2015/06/07/im-in-ur-javascript-decoding-ur-webm/ for better quality (especially in high-motion scenes) -- once that stabilizes I'll rename the combined package codec.js. Performance of WebM is not yet good enough to deploy, and some features like seeking are still missing, but breaking out the codec modules means I can develop the codecs in parallel and keep the high-level player logic in common. Browser infrastructure improvements like SIMD, threading, and more GPU access should continue to make WebM decoding faster in the future as well. I'd also like to finish up my OGVKit package https://github.com/brion/OGVKit for iOS, so we can embed a basic audio/video player at full quality into the Wikipedia iOS app. This needs some
Re: [Wikitech-l] [Multimedia] What to do with TimedMediaHandler
This is a fair assessment of the challenges / divergent code bases. In terms of a path forward, I think it’s worth highlighting how the Kaltura player normally integrates with other stand alone entity providers now days. We normally integrate via a media proxy library that basically normalizes representation of stream sources, media identifiers, structured and unstructured metadata, captions, cuePoints, content security protocols to a “Kaltura like” representation then the CMS’s consume the kaltura player iframe services in a way that is almost identical to our Kaltura Platform style embeds, just overriding identifiers. This makes the iframe player service easy to be consumed by our native components, twitter embeds etc. See architecture overview here. [1] You can see what this looks like with a mediaProxy override sample [2] Is this useful for wikimedia use case? … not so sure ... since the review scope would grow a lot if we had the player serving its own iframe independently of the rest of code infrastructure it would otherwise duplicate many components, and reduce insensitive to align versions of things. Some significant brainstorming and alignment would need to take place which has awaited a focus from the multimedia team; since we would want to focus efforts towards something that would be sustainable for both organizations going forward both from community and organization contributions so the projects could better benefit each other. * Kaltura will move quickly to review and integrate the code styling / js-hint updates something we have intended to do for a while. Other low hanging fruit alignments have already been integrated, by some early work on this by paladox2015 ( github id ). * Kaltura would be interested in working to make things as easy as possible to use the library in both context; but we need “a plan”. While things have drifted significantly there is paths towards upgrading things, a goal to align code conventions [3] means the projects share a lot more then say some arbitrary other project out there that would do everything its own way ;) * That being said, the possibility for WMF to use something new should evaluated, but again should involve multimedia team within WMF so that the cost benefit analysis can be mapped out per organization infrastructure support; or a similar situation will crop up after a sprint of effort produced something usable, but was not maintained going forward. [1] http://knowledge.kaltura.com/sites/default/files/styles/large/public/kaltura-player-toolkit.png http://knowledge.kaltura.com/sites/default/files/styles/large/public/kaltura-player-toolkit.png [2] http://kgit.html5video.org/pulls/1194/modules/KalturaSupport/tests/StandAlonePlayerMediaProxyOverride.html http://kgit.html5video.org/pulls/1194/modules/KalturaSupport/tests/StandAlonePlayerMediaProxyOverride.html 3] https://github.com/kaltura/mwEmbed/#hacking-on-mwembed https://github.com/kaltura/mwEmbed/#hacking-on-mwembed On Dec 11, 2014, at 7:55 AM, Derk-Jan Hartman d.j.hartman+wmf...@gmail.com wrote: So for a while now, I have been toying a bit with TimedMediaHandler/MwEmbed/TimedText, with the long term goal of wanting it to be compatible with VE, live preview, flow etc. There is a significant challenge here, that we are sort of conveniently ignoring because stuff 'mostly works' currently and the MM team having their plate full with plenty of other stuff: 1: There are many patches in our modules that have not been merged upstream 2: There are many patches upstream that were not merged in our tree 3: Upstream re-uses RL and much infrastructure of MW, but is also significantly behind. They still use php i18n, and their RL classes themselves are also out of date (1.19 style ?). This makes it difficult to get 'our' changes merged upstream, because we need to bring any RL changes etc with it as well. 4: No linting and code style checks are in place, making it difficult to assess and maintain quality. 5: Old jQuery version used upstream 6: Lot's of what we consider deprecated methodologies are still used upstream. 7: Upstream has a new skin ?? 8: It uses loader scripts on every page, which really aren't necessary anymore now that we can add modules to ParserOutput, but since I don't fully understand upstream, i'm not sure what is needed to not break upstream in this regard. 9: The JS modules arbitrarily add stuff to the mw. variables, no namespacing there. 10: The RL modules are badly defined, overlap each other and some script files contain what should be in separate modules 11: We have 5 'mwembed' modules, but upstream has about 20, so they have quite a bit more code to maintain and migrate. 12: Brion is working on his ogvjs player which at some point needs to integrate with this as well (Brion already has some patches for this [1]). 13: Kaltura itself seems very busy and doesn't seem to have too
Re: [Wikitech-l] [Multimedia] ogv.js - JavaScript video decoding proof of concept
Amazing work. Added bug to integrate into TMH player. https://bugzilla.wikimedia.org/show_bug.cgi?id=61823 I can’t imagine anyone being against flash to deliver free formats! —michael On Feb 23, 2014, at 5:45 PM, Brion Vibber bvib...@wikimedia.org wrote: In case anybody's interested but not on wikitech-l; looking for some feedback on possible directions for fallback in-browser video players. -- brion -- Forwarded message -- From: Brion Vibber bvib...@wikimedia.org Date: Sun, Feb 23, 2014 at 6:43 AM Subject: Re: ogv.js - JavaScript video decoding proof of concept To: Wikimedia-tech list wikitech-l@lists.wikimedia.org Just an update on this weekend project, see the current demo in your browser[1] or watch a video of Theora video playing on an iPhone 5s![2] [1] https://brionv.com/misc/ogv.js/demo/ [2] http://www.youtube.com/watch?v=U_qSfHPhGcA * Got some fixes and testing from one of the old Cortado maintainers -- thanks Maik! * Audio/video sync is still flaky, but everything pretty much decodes and plays properly now. * IE 10/11 work, using a Flash shim for audio. * OS X Safari 6.1+ works, including native audio. * iOS 7 Safari works, including native audio. Audio-only files run great on iOS 7 devices. The 160p video transcodes we experimentally enabled recently run *great* on a shiny 64-bit iPhone 5s, but are still slightly too slow on older models. The Flash audio shim for IE is a very simple ActionScript3 program which accepts audio samples from the host page and outputs them -- no proprietary or patented codecs are in use. It builds to a .swf with the open-source Apache Flex SDK, so no proprietary software is needed to create or update it. I'm also doing some preliminary research on a fully Flash version, using the Crossbridge compiler[3] for the C codec libraries. Assuming it performs about as well as the JS does on modern browsers, this should give us a fallback for old versions of IE to supplement or replace the Cortado Java player... Before I go too far down that rabbit hole though I'd like to get peoples' opinions on using Flash fallbacks to serve browsers with open formats. As long as the scripts are open source and we're building them with an open source toolchain, and the entire purpose is to be a shim for missing browser feature support, does anyone have an objection? [3] https://github.com/adobe-flash/crossbridge -- brion On Mon, Oct 7, 2013 at 9:01 AM, Brion Vibber bvib...@wikimedia.org wrote: TL;DR SUMMARY: check out this short, silent, black white video: https://brionv.com/misc/ogv.js/demo/ -- anybody interested in a side project on in-browser audio/video decoding fallback? One of my pet peeves is that we don't have audio/video playback on many systems, including default Windows and Mac desktops and non-Android mobile devices, which don't ship with Theora or WebM video decoding. The technically simplest way to handle this is to transcode videos into H.264 (.mp4 files) which is well supported by the troublesome browsers. Unfortunately there are concerns about the patent licensing, which has held us up from deploying any H.264 output options though all the software is ready to go... While I still hope we'll get that resolved eventually, there is an alternative -- client-side software decoding. We have used the 'Cortado' Java applet to do fallback software decoding in the browser for a few years, but Java applets are aggressively being deprecated on today's web: * no Java applets at all on major mobile browsers * Java usually requires a manual install on desktop * Java applets disabled by default for security on major desktop browsers Luckily, JavaScript engines have gotten *really fast* in the last few years, and performance is getting well in line with what Java applets can do. As an experiment, I've built Xiph's ogg, vorbis, and theora C libraries cross-compiled to JavaScript using emscripten and written a wrapper that decodes Theora video from an .ogv stream and draws the frames into a canvas element: * demo: https://brionv.com/misc/ogv.js/demo/ * code: https://github.com/brion/ogv.js * blog some details: https://brionv.com/log/2013/10/06/ogv-js-proof-of-concept/ It's just a proof of concept -- the colorspace conversion is incomplete so it's grayscale, there's no audio or proper framerate sync, and it doesn't really stream data properly. But I'm pleased it works so far! (Currently it breaks in IE, but I think I can fix that at least for 10/11, possibly for 9. Probably not for 6/7/8.) Performance on iOS devices isn't great, but is better with lower resolution files :) On desktop it's screaming fast for moderate resolutions, and could probably supplement or replace Cortado with further development. Is anyone interested in helping out or picking up the project to move it towards proper playback?
Re: [Wikitech-l] showing videos and images in modal viewers within articles
On 05/30/2013 06:28 PM, Ryan Kaldari wrote: OK, I decided to be slightly bold. I changed the modal video threshold on en.wiki from 200px to 800px. This means all video thumbnails that are 800px or smaller will open a modal player when you click on the thumbnail. If there are no complaints from people, we can switch the modal behavior to just be the default everywhere. Try it out and let me know what you think: https://en.wikipedia.org/wiki/Congenital_insensitivity_to_pain#Presentation Ryan Kaldari I would lean towards more like 400 px. There are probably pages that have large videos already, maybe don't need to be re-modal-ized ? I agree with Erik we should autoplay after you click on the play button on a modal. https://gerrit.wikimedia.org/r/66551 Note in IOS, modal popups require an additional click to play if loading anything asyncronusly. We have done work in the kaltura library to be smart capturing the click gesture in thumbnail embeds [1]. In mediaWiki we may need to do something similar if we async load the player library. http://player.kaltura.com/docs/thumb But the extra click is the least of our iOS video issues, for the time being :( --michael http://player.kaltura.com/docs/thumb ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] wav support to Commons (was Re:Advice Needed)
On 04/11/2013 10:48 AM, Quim Gil wrote: I'm just trying to be consistent: a GSOC project can't force the agenda of a Wikimedia project. Also conservative when it comes to manage GSOC students expectations. These bug reports have been open for years, and I don't want to guarantee to a GSOC student that they can count on seeing them fixed now. Bug 20252 - Support for WAV and AIFF by converting files to FLAC automatically https://bugzilla.wikimedia.org/show_bug.cgi?id=20252 Bug 32135 - WAV audio support via TimedMediaHandler https://bugzilla.wikimedia.org/show_bug.cgi?id=32135 Adding Wav to TMH is a pretty small technical addition. Audio transcoding was already added by Jan. Adding .wav support on top of that, is probably one of the easiest parts of the project. I don't think the project would be forcing an agenda on commons, its analogous work to add TIFF support a while back. Also this is mostly an intermediate solution while browsers can only capture and upload PCM wav data. Once browsers ship the full record api, we will be able to 'export out' the captured Opus audio and upload that. Then transcode from that Opus oga to additional formats that can be played in ( other ) browsers and devices. --michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Audio derivatives, turning on MP3/AAC mobile app feature request.
Yes' all that changed is we added support for audio derivatives. We have not enabled mp3 or AAC. The same code can be used for flac - ogg or whatever we configure. On Feb 3, 2013 2:33 AM, Yuvi Panda yuvipa...@gmail.com wrote: Just to be sure that I'm reading this right - nothing actually changed yet. We still are a free-formats-only shop for A/V. Right? -- Yuvi Panda T http://yuvi.in/blog ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Audio derivatives, turning on MP3/AAC mobile app feature request.
+correct content-type this time ;) Note this has already been merged, but still worth mention for visibility. On 2/1/13 12:10 PM, Michael Dale wrote: We are about to merge in support for audio derivatives to Timed Media Handler (TMH). The big value here, I think is encoding to AAC or MP3 and adding a /listen to this article/ feature to the mobile app. https://gerrit.wikimedia.org/r/#/c/39363/ This can really help with improving accessibility of Wiktionary pronunciation media files as well. Also AAC / m4v ingestion, could make audio recordings a lot easier to import into the site, i.e a record a reading of this article mobile app feature #2 ;) There are already thousands of spoken articles, with some promotion their could probably be a lot be more: http://en.wikipedia.org/wiki/Category:Spoken_articles The software patent situation for mp3 is sad, considering how long the mp3 format has been around: http://www.tunequest.org/a-big-list-of-mp3-patents/20070226/ I think AAC is a similar situation, encoder wise: http://en.wikipedia.org/wiki/Advanced_Audio_Coding#Licensing_and_patents But fundamentally Wikimedia is not distributing these encoders and there are no royalties for media distribution. Likewise we are not shipping decoders ( the decoders are in browser or the mobile OS ) I don't know why Wikimedia's commitment to being accessible in royalty free formats, somehow also precludes making content accessible for folks on platforms that ~don't~ decode royalty free formats. But hopefully we can change that over time. Not sure if this is the right forum for this, but I hope we could come out of this thread with rough consensus to enable these formats to help increase the reach of audio works. peace, --michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Video on mobile: Firefox works, way is paved for more browser support
On 12/13/2012 12:38 PM, Brion Vibber wrote: It's much, MUCH easier for us to flip the H.264 switch... there are ideological reasons we might not want to, but we're going to have to put the effort into making those player apps if we want all our data accessible to everyone. +1 its non trivial amount of effort to integrated native players across at least 3 major platforms, ( iOS, Android, Win8 ), And as pointed out in the thread, low power android / firefox OS devices include h.264 hardware decoders but will fail for medium resolution webm. I think Wikimedia mobile product needs to come up with some recommendations for the Board / community to evaluate. There are trade offs in effort and resource allocation. Is integrating software video decoders with native apps the best use of resources? or are there other higher priority efforts? Or more realistically, the ideological hard line, means kicking the proverbial video on Wikipedia bucket further down stream, which is also a trade off of sorts. --michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Video on mobile: Firefox works, way is paved for more browser support
On 12/13/2012 04:56 PM, Brion Vibber wrote: On Thu, Dec 13, 2012 at 10:38 AM, Brion Vibber bvib...@wikimedia.orgwrote: On Wed, Dec 12, 2012 at 2:50 PM, Rob Lanphier ro...@wikimedia.org wrote: I was able to play the WebM file of the locomotive on the front page of https://commons.wikimedia.org just now on my Nexus 7 using Chrome, so at least on very new stock Android devices, all is well. My much older Galaxy S didn't fare so well, though, so I would be willing to believe that Android devices with proper WebM support are still relatively rare. That said, the replacement rate for this hardware is frequent enough that it won't be long before my Nexus 7 is much older. I can play the current media on the front page of Commons in Chrome on my Nexus 7, but it won't play in position on either desktop http://en.wikipedia.org/wiki/Serge_Haroche or mobile http://en.m.wikipedia.org/wiki/Serge_Haroche ... Sigh. :) I think this relates to the page not being purged after the transcodes are updated. If you purge the page, will probably give the nexus a more playable flavour. http://en.wikipedia.org/wiki/Serge_Haroche should work on your nexus now ;) TMH should add page purge to the job queue, but not sure why that page had not been purged yet. Still some work to be done on compatibility... I also notice that the source elements in the video seem to start with the original, and aren't labeled with types or codecs. This means that without the extra Kaltura player JS -- for instance as we see it on the mobile site right now -- the browser may not be able to determine which file is playable or best-playable. For correctness we should include type. But I don't know if that will help, the situation you describe. https://gerrit.wikimedia.org/r/#/c/38665/ But certainly will help in the other ways you outline in the bug 43101 AFAIK there are no standard source tag attributes to represent device specific playback targets ( other than type ), so we set a few in data-* tags and read them within the kaltura html5 lib to do flavour selection. We of course use the Kaltura HTML5 lib on lots of mobile devices, so if you want to explore usage in the mobile app happy to support. For example including the payload into the application itself ( so its not a page view time ) peace, --michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Video on mobile: Firefox works, way is paved for more browser support
As Brion points out, we get much better coverage. I enabled h.264 locally and ran though a set of Android , iOS and desktop browsers I had available at the time: http://www.mediawiki.org/wiki/Extension:TimedMediaHandler/Platform_testing Pro h.264: * No one is proposing turning off webm, an ideological commitment to support free access with free platforms in royalty free formats, does not necessarily require you exclude derivation to proprietary formats. * We already are not ideologically pure ** We submit to the apple store terms of service, we build outputs with non-freedom iOS tool chain etc. ** We write custom code / work arounds to support proprietary non web-standard browsers. * There is little to no chance of Apple adding googles codec support to their platform. * We could ingest h.264 making letting the commons store source material in its originally source captured format. This is important for years down the road we have the highest quality possible. * Chicken and egg, for companies like apple to care about wikimedia webm only support, wikimedia would need lots of video, as long as we don't support h.264 our platform discourages wide use video on articles. Pro Webm: * Royalty free purity in /most/ of what wikimedia distributes. * We could in theory add software playback of webm to our iOS and android app. * Reduced storage costs ( marginal, vs public good of access ) * Reduced licence costs for an h.264 encoder on our two transcoding boxes ( very marginal ) * Risk that mpeg-la adds distribution costs for free online distribution in the future. Low risk, and we could always turn it off --michael On 12/12/2012 11:26 AM, Luke Welling wrote: FirefoxOS/Boot2Gecko phones presumably also support Ogg Theora and WebM formats, but they're not really a market share yet and may never be in the developed world. Without trying to downplay the importance of ideological purity, keep in mind that Mozilla, who have largely the same ideology on the matter have conceded defeat on the practical side of it after investing significant effort. Eg http://appleinsider. com/articles/12/03/14/mozilla_considers_h264_video_support_after_googles_vp8_fails_to_gain_traction With Google unwilling to commit the battle was winnable. There is not an ideologically pure answer that is compatible with the goal of taking video content and disseminating it effectively and globally. The conversation needs to be framed as what shade of grey is an acceptable compromise. Luke Welling On Wed, Dec 12, 2012 at 6:44 AM, Antoine Musso hashar+...@free.fr wrote: Le 12/12/12 00:15, Erik Moeller a écrit : Since there are multiple potential paths for changing the policy (keeping things ideologically pure, allowing conversion on ingestion, allowing h.264 but only for mobile, allowing h.264 for all devices, etc.), and since these issues are pretty contentious, it seems like a good candidate for an RFC which'll help determine if there's an obvious consensus path forward. Could we host h.264 videos and related transcoders in a country that does not recognize software patents? Hints: - I am not a lawyer - WMF has server in Netherlands, EU. -- Antoine hashar Musso ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Switching to Timed Media Handler on Commons
On 10/28/12 11:41 PM, Rob Lanphier wrote: Hi everyone, Assuming we get the last blocking bugs fixed tomorrow, then we should be able to go onto Commons on Wednesday, so that's our current plan. Let us know if there are issues with this. Thanks! Rob Thanks for the update Rob. I did not see details on the configuration phases in this deployment ? On test2 the configuration supports uploading webm. Is the plan to enable webm on commons prior to other wikis supporting the embedding of webm files? I guess this would not be completely disastrous; it should fail over to links back to commons for unknown media types? Also we have not yet connected test2 to the video-scalars / transcoding, so imagine we will want to test that in the next few days? And we should know ahead of time if the derivatives plan to be enabled on commons as well. Again the other wikis won’t be able to embed the derivatives until TMH is in use on them as well. I know we conducted tests for TMH “playing” well with oggHandler provider ( i.e the test2.wikipedia.org pages are embedding commons videos but played back with TMH ) ... I am not sure if we have conducted tests for oggHandler playing well with a TMH provider with the possible configuration phases mentioned above. Happy to see progress, I will try and be available if anything comes up. --michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] So what's up with video ingestion?
On 06/18/2012 04:52 PM, Brion Vibber wrote: On Mon, Jun 18, 2012 at 4:44 PM, David Gerard dger...@gmail.com wrote: On 19 June 2012 00:30, Brion Vibber br...@pobox.com wrote: warning: patent politics question may lead to offtopic bikeshedding Additionally there's the question of adding H.264 transcode *output*, which would let us serve video to mobile devices and to Safari and IE 9 without any custom codec or Java installations. As far as I know that's not a huge technical difficulty but still needs to be decided politically, either before or after an initial rollout of TMH. /warning It's entirely unclear to me that this is intrinsically related. They're intrinsically related because one depends on the other to be possible. This is a one-way dependency: H.264 output depends on TimedMediaHandler support. TimedMediaHandler in general doesn't depend on H.264 output, and should not be confused with it. -- brion I think what we should do here is go ahead and add support for ingestion and output and then we can just adjust the settings file based on what we /decide politically/ do going forward. Since both the deployment review pipeline as well as the political decision pipeline can be ~quite long~ probably best to have it all supported so we can just adjust a configuration file once we decide one way or another. --michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] What's the best place to do post-upload processing on a file? Etc.
You will want to put into a jobQueue you can take a look at the Timed Media Handler extension for how post upload processor intensive transformations can be handled. --michael On 05/04/2012 04:58 AM, emw wrote: Hi all, For a MediaWiki extension I'm working on (see http://lists.wikimedia.org/pipermail/wikitech-l/2012-April/060254.html), an effectively plain-text file will need to be converted into a static image. I've got a set of scripts that does that, but it takes my medium-grade consumer laptop about 30 seconds to convert the plain-text file into a ray-traced static image. Since ray-tracing the images being created here substantially improves their visual quality, my impression is that it's worth a moderately expensive transformation operation like this, but only if the operation is done once. Given that, I assume it'd be best to do this transformation immediately after the plain-text file has completed uploading. Is that right? If not, what's a better time/way to do that processing? I've looked into MediaWiki's 'UploadComplete' event hook to accomplish this. That handler gives a way to access information about the upload and the local file. However, I haven't been able to find a way to get the uploaded file's path on the local file system, which I would need to do the transformation. Looking around related files I see references to $srcPath, which seems like what I would need. Am I just missing some getter method for file system path data in UploadBase.php or LocalFile.php? How can I get the information about an uploaded file's location on the file system while in an onUploadComplete-like object method in my extension? Thanks, Eric ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Chunked uploading API documentation; help wanted
Thanks Brion ( and Erik ), for brining chunk uploading closer to fruition. +Jan, Can you help out with documenting the api? I will take a pass at it as well when I get a chance ;) --michael On 04/17/2012 03:37 PM, Brion Vibber wrote: I've started adding some documentation on chunked uploading via the API on mediawiki.org: https://www.mediawiki.org/wiki/API:Upload#Chunked_uploading This info is based on watching UploadWizard at work, and may be incomplete or misleading. :) So please feel free to hop in and help clean it up, thanks! There also probably needs to be better information about stashed uploads, which has some intersection with chunked (for instance -- is it possible to do chunked upload without using the stash? Or are they required together?) -- brion ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Video codecs and mobile
On 03/20/2012 03:15 AM, David Gerard wrote: We should definitely be able to ingest H.264. (This has been on the wishlist forever and is a much harder problem than it sounds.) Once TMH is deployed, practically speaking .. upload to youtube - import to commons .. will probably be the easiest path for a while. Especially given the tight integration youtube has with every phone, and any capture device with web. But yes the feature should be developed, and it is more difficult then it sounds when you want to carefully consider things like making the source file available. --michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Video codecs and mobile
On 03/19/2012 06:24 PM, Brion Vibber wrote: In theory we can produce a configuration with TimedMediaHandler to produce both H.264 and Theora/WebM transcodes, bringing Commons media to life for mobile users and Apple and Microsoft browser users. What do we think about this? What are the pros and cons? -- brion The point about mobile is very true and its very very difficult to debase entrenched formats, especially when its tied up in hardware support. And of course the Kaltura HTML5 library used in TMH has a lot of iPad and Android H.264 support code in there for all the commercial usage of the library, so it would not be a technical challenge to support it. But I think we should get our existing TMH out the door exclusively supporting WebM and Ogg. We and can revisit adding support for other formats after that. High on that list is also mp3 support which would have similar benefits for audio versions of articles and mobile hardware support audio playback. If people felt it was important, By the end of the year we could have javascript based webm decoders for supporting WebM in IE10 ( in case people never saw this: https://github.com/bemasc/Broadway ) But of course this could be seen as insert your favourite misguided good efforts analogy here. i.e maybe efforts are better focused on tools streamlining video contribution process. Maybe we focus on a way to upload h.264 videos from mobile. Of course doing mobile h.264 uploads correctly would ideally include making source content available, for maximising re-usability of content, without the quality loss in multiple encoding passes, so in effect running up against the very principal that governs the Wikimedia projects to make content a freely reusable resources. I think Mozilla adding /desktop/ h.264 support may hurt free formats. On desktop they already have strong market share, and right now many companies actually request including WebM in their encoding profiles ( on kaltura ) but that of course would not be true if the Mozilla supports h.264 on desktop, and it would make it harder for google chrome to follow through on their promise to only support WebM ( if they still plan on doing that ). For mobile it makes sense, Mozilla has no market share there and they have to be attractive to device manufactures create a solid mobile user experience, fit within device battery life expectations etc. And on mobile there is no fall back to flash if the site can't afford to encode all their content into free formats and multiple h.264 profiles. And they can't afford that on a that browser / platform that people have to generality /choose /to install and use. If they support h.264 on desktop it will be a big set back for free formats, because there won't be any incentive for the vast majority of pragmatic sites to support webm. --michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Image uploading social workflow
There was the Add Media Wizard project from a while back ( that sounds similar to what you describe ) http://www.mediawiki.org/wiki/Extension:Add_Media_Wizard I wanted to take a look at integrating upload wizard into it post TMH deployment, and or something new could be built as a gadget as well. --michael On 12/18/2011 07:22 AM, Gregor Hagedorn wrote: The improvements to the Upload Wizard are very welcome, but socially, I think it is still broken. Please correct me if I overlook something or overlook another extension. Socially, I believe many mediawiki extensions need a way to ask for images on a topic page, provide an upload wizard AND display the results on the topic page. Presently, even if the image is added to the wiki or a commons repository, it simply disappears in a black hole from the perspective of the contributing image author. I believe it is possible to have a wizard option which does the following: * store the page context from which it was called. * upload images to local wiki or a repository * open the page context in edit mode * search for some form of new-images-section ** a possible implementation of this could be a div with id=newimages containing a gallery tag * if new-images-section exists: add images, if not create with new images. * Save context page. Presently, WMF is possibly the biggest driver of open content (CC BY/CC BY-SA) but is able to collect images only from the small population that is the intersection of the population or people able to edit mediawiki and the huge population able to provide quality images. The new-images-section solution would probably not directly work for wikipedia itself; here some more complex review mechanism (new images gallery would be shown only to some users, including image uploader, or so) would be needed, perhaps in combination with flagged rev. I view this feature however potentially as a two step process: implement with direct addition to page, modify to optimize for flagged revs. However, I think something the described feature would be needed; presently all these crowdsourcing images are mostly collected by projects that either use no open content license at all, or the NC license at best. WMF is not able to exert its potential pull towards open content in this area. Also reported as https://bugzilla.wikimedia.org/show_bug.cgi?id=33234 Gregor ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Minor API change, ApiUpload
In terms of a db schema friendly to chunks, we would probably want another table and associated it with a stashed file. Russ was discussing adding support for appending files within the swft object store application logic, so we may not have to be concerned with storing chunk references in the db? Another potential usage for a media stash is transcoding non-free formats. Here we could use the media stash as temporary location to put the media files while we transcode them. Once transcoded we could then move them into the published space. But I would not be too worried about incorporating that into the the DB schema until we get to implementation. --michael On 07/13/2011 12:30 AM, Bryan Tong Minh wrote: Great work people! I've really been waiting for this, and I'm glad that it has been finally implemented. A remark about extensibility: in the future we might want to use the upload stash for more advanced features like asynchronous uploads and chunked uploads. I think the database schema should already be prepared for this, even if we're not using it. For this purpose I would at least add us_status. Perhaps Michael has some ideas what such a database schema should further incorporate. Cheers, Bryan ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] thumbnail generation for Extension:OggHandler
Thanks for this thread, Please do commit fixes for 1.17, If not obvious already I have really only been targeting trunk. While the extension has been around since before 1.16, it would be very complicated to restore the custom resource loader it was using before there was a resource loader in core. In terms of Superpurge, that is essentially what we do with the transcode status table on the image page that lets users with the given permission purge the transcodes. We did not want to make it too too easy to purge transcodes cuz once purged a video could be inaccessible for devices / browsers that only had webm or only had ogg support, until the file was retranscoded. --michael On 07/11/2011 04:48 AM, Dmitriy Sintsov wrote: Yes, locally patched both issues, now runs fine. $wgExcludeFromThumbnailPurge is not defined in 1.17, made a check. BTW, it is a bit evil to exclude some extensions from Purge. Maybe there should be another action Superpurge? Linker::link() was called statically in TranscodeStatusTable class. Created new Linker() and now it works. I haven't committed into svn, don't know if anyone cares of backcompatibility. BTW, if I'd was an leading developer (who makes decisions), I'd probably make MW 1.16 an LTS (long time support) version for legacy setups and extensions.. Though maybe that is unneeded (too much of burden for non-profit organization). Dmitriy ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] thumbnail generation for Extension:OggHandler
I recommend using the static binaries hosted on firefogg or if you want to compile it your self using the build tools provided there: http://firefogg.org/nightly/ Also I would suggest you take a look at TimedMediahandler as an alternative to oggHandler it has a lot more features such as WebM, timed text, and transcoding support. http://www.mediawiki.org/wiki/Extension:TimedMediaHandler A live install is on prototype if you want to play around with it: http://prototype.wikimedia.org/tmh/ If you run into any issue, please report them on the bug tracker or directly to me. https://bugzilla.wikimedia.org/enter_bug.cgi?product=MediaWiki%20extensionscomponent=TimedMediaHandler peace, --michael On 07/08/2011 04:18 AM, Dmitriy Sintsov wrote: Hi! What's the proper way of thumbnail generation for Ogg media handler, so it will work like at commons? First, I've downloaded and compiled latest ffmpeg version (from git://git.videolan.org/ffmpeg.git) using the following configure options: ./configure --prefix=/usr --disable-ffserver --disable-encoder=vorbis --enable-libvorbis The prefix is usual for CentOS layout (which I have at hosting) and best options for vorbis were suggested in this article: http://xiphmont.livejournal.com/51160.html I've downloaded Apollo_15_launch.ogg from commons then uploaded to my wiki to check Ogg handler. The file was uploaded fine, however the thumbnail is broken - there are few squares at gray field displayed instead of rocket still image. In Extension:OggHandler folder I found ffmpeg-bugfix.diff. However there is no libavformat/ogg2.c in current version of ffmpeg. Even, I found the function ogg_get_length () in another source file, however the code was changed and I am not sure that manual comparsion and applying is right way. It seems that the patch is suitable for ffmpeg version developed back in 2007 but I was unable to find original sources to successfully apply the patch. I was unable to find ffmpeg in Wikimedia svn repository. Is it there? Then, I've tried svn co https://oggvideotools.svn.sourceforge.net/svnroot/oggvideotools oggvideotools but I am upable to compile neither trunk nor branches/dev/timstarling version, it bails out with the following error: -- ERROR: Theora encoder library NOT found -- ERROR: Theora decoder library NOT found -- ERROR: Vorbis library NOT found -- ERROR: Vorbis encoder library NOT found -- ogg library found -- GD library and header found CMake Error at CMakeLists.txt:113 (MESSAGE): I have the following packages installed: libvorbis-1.1.2-3.el5_4.4 libvorbis-devel-1.1.2-3.el5_4.4 libogg-1.1.3-3.el5 libogg-devel-1.1.3-3.el5 libtheora-devel-1.0alpha7-1 libtheora-1.0alpha7-1 ffmpeg compiles just fine (with yasm from alternate repo, of course). But there is no libtheoradec, libtheoraenc, libvorbisenc neither in main CentOS repository nor in aliernative http://apt.sw.be/redhat/el5/en/i386/rpmforge/RPMS/ However it seems these is libtheoraenc.c in ffmpeg; what is the best source of these libraries? It seems that there is no chance to find proper rpm's for CentOS and one need to compile these from sources? Dmitriy ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] ResourceLoader JavaScript validation on trunk (bug 28626)
On 07/06/2011 03:04 PM, Brion Vibber wrote: Some of you may have found that ResourceLoader's bundled minified JavaScript loads can be a bit frustrating when syntax errors creep into your JavaScript code -- not only are the line numbers reported in your browser of limited help, but a broken file can cause *all* JS modules loaded in the same request to fail[1]. This can manifest as for instance a jquery-using Gadget breaking the initial load of jquery itself because it gets bundled together into the same request. Long term I wonder if we should not be looking at closure compiler [1], we could gain an additional 10% or so compression with simple optimisations, and it has tools for inspecting compiled output [2] Long term we could work toward making code compatible with advanced optimisations, as a side effect we could get improved jsDoc docs and even better compression and optimisations would be possible. [1] http://code.google.com/closure/compiler/ [2] http://code.google.com/closure/compiler/docs/inspector.html ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] background shell process control and exit status
For the TimedMediaHandler I was adding more fine grain control over background processes [1] and ran into a unix issue around getting both a pid and exit status for a given background shell command. Essentially with a background task I can get the pid or the exit status but can't seem to get both: to get the pid: $pid = wfShellExec(nohup nice -n 19 $cmd /tmp/stdout.log echo $!; put the exit status into a file: $pid = wfShellExec(nohup nice -n 19 $cmd /tmp/stdout.log echo $? /tmp/exit.status; But if I try to get both either my exit status is for the echo pid command or my pid is for the echo exit status command. It seems like there should be some shell trick back-reference background tasks or something If nothing else I think this could be done with a shell script and pass in a lot of path targets and use the wait $pid command at the end to grab the exit code of the background process. Did a quick guess at what this would look like in that same commit[1], but would rather just do some command line magic instead of putting a .sh script in the extension. [1] http://www.mediawiki.org/wiki/Special:Code/MediaWiki/90068 peace, --michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] background shell process control and exit status
On 06/14/2011 05:41 PM, Platonides wrote: Do you want the command to be run asynchronously or not? If you expect the status code to be returned by wfShellExec(), then the process will obviously have finished and there's no need for the PID. OTOH if you launch it as a background task, you will want to get the PID, and then call pcntl_waitpid* on it to get the status code. *pcntl_waitpid() may not work, because $cmd is unlikely to be a direct children of php. You could also be expecting to check it from a different request. So you would enter into the world of killing -0 the process to check if it's still alive. Yes the idea is to run the command asynchronously so we can monitor the transcode progress and kill it if it stops making progress. Calling pcntl_waitpid with pcntl_fork as Tim mentions may be the way to get it done. With the child including the pcntl_waitpid call and the parent monitoring progress and killing the child if need be. --michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Foundation-l] YouTube and Creative Commons
On 06/04/2011 06:43 PM, David Gerard wrote: A question that wasn't clear from reading the bug: why is reading a file format (WebM) blocked on the entire Timed Media Handler? It would be complicated to support WebM without an improved player and transcoding support. All the IE users for example can only decode ogg with cortado, if we don't use TMH WebM files when embed in articles would not play for those users. Likewise older versions of firefox only playback ogg. Additionally, issues around HD files embedded into articles is already an issue with users uploading variable bit-rate HD oggs, giving a far from ideal experience on most Internet connections and most in-browser playback engines. This would be an issue for variable bitrate webm files as well ( without the transcoding support of TMH ) Other features that have been living in the mwEmbed gadget for a long time like timed text, remote embedding / video sharing, and temporal media references / embeds are all better supported in TMH as an extension, so we would be good to move those features over. --michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Code review process (was: Status of more regular code deployments)
On 06/01/2011 08:28 AM, Chad wrote: I don't think revert in 72 hours if its unreviewed is a good idea. It just discourages people from contributing to areas in which we only have one reviewer looking at code. I *do* think we should enforce a 48hr revert if broken rule. If you can't be bothered to clean up your breakages in within 48 hours of putting your original patch in, it must not have been very important. -Chad I think a revert on sight, if broken is fair ... you can always re-add it in after you fix it ... if its a 'works diffrently than expected' type issue / not perfectly following coding conventions a 48hr window to make progress ( during the work week ) sounds reasonable. --michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Media embedding: oEmbed feedback?
sorry for the re-post ( having trouble with the wikitech-l list post email migration :( I would also be interested in discussing this in Berlin or otherwise ;) I can offer some notes about video embedding inline: On 04/29/2011 03:30 PM, Brion Vibber wrote: Enhanced media player goodies like embedding have been slowly coming along, with a handy embedding option now available in the fancy version of the media player running on Commons. This lets you copy a bit of HTML you can paste into your blog or other web site to drop in a video and make it playable -- nice! Some third-party sites will also likely be interested in standardish ways of embedding offsite videos from Youtube, Vimeo, and other providers. It appears the iframe embed method is becoming somewhat standardise way to share videos. With Youtube, Vimeo, and others providing it as an option to deliver both flash and html5 players. The bit of HTML that you copy on commons share video function is just an iframe ( similar to those other sites ). Timed Media Handler works the same way using the same url parameter ( embedplayer=yes ) so that we can seamlessly replace the 'fancy media player' rewrite with a similar embed player page delivered by the TMH extension [1] The iframe player lets you sandbox the player when you embed it in foreign domain contexts, and enables you to deliver the interface that includes things like the credits screen that parses our description template page on commons to present credit information and a link back to the description page. As iframe embed is relatively standard, we simply have to request that our domain be white listed for it to be shared on facebook , wordpress etc. In addition to working as a pure iframe without xss javascript, to support mashups like the googles player [2] if you include a bit of JS where you embed the iframe, the mwEmbed player also has an iframe api that lets you use the HTML5 video api on the iframe as if it was a video tag in the page. [3] oEmbed is a nice way to consistently 'discover' embed code and media properties. Its implementation within mediaWiki would be akin to supporting RSS or OpenSearch, so I think its something we should try and do. As the spec currently stands its api for the embed code rather than an api for mashups. I think more interesting things could be done in addition to the iframe, object tag and basic metadata ... like giving the urls to all the media files, and urls to all the associated timed text of a given player ... Something like the ROE standard [4] that we ( xiph, annodex ) folks were talking about a while back might be a good direction to extend oEmbed into. ( Although commercial video service sites are not likely to be interested in mash-ups outside of their player hence oEmbed leaning toward 'html' to embed the players... direct links to associated media is one of those standard ideas that in theory is good, but does not play well with video service business models ... but that does not have stop us / oEmbed from promoting it :) I would also add the TMH adds a separate api entry point to deliver some of this info such as the urls for all the derivatives related to a particular media title [5]. I would like to add associated timed text listing to that videoinfo prop and from there it should not be hard to adapt that to a ROE or oEmbed v2 type representation. [1] http://prototype.wikimedia.org/timedmedia/Main_Page#Iframe_embed_and_viral_sharing [2] http://code.google.com/apis/youtube/iframe_api_reference.html [3] http://svn.wikimedia.org/svnroot/mediawiki/trunk/extensions/TimedMediaHandler/MwEmbedModules/EmbedPlayer/resources/iframeApi/ [4] http://wiki.xiph.org/index.php/ROE [5] http://prototype.wikimedia.org/tmh/api.php?action=querytitles=File:Shuttle-flip.webmprop=videoinfoviprop=derivativesformat=jsonfm ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Status of 1.17 1.18
On 04/15/2011 12:07 PM, Brion Vibber wrote: Unexercised code is dangerous code that will break when you least expect it; we need to get code into use fast, where it won't sit idle until we push it live with a thousand other things we've forgotten about. Translate wiki deserves major props for running a real world wiki on trunk. Its hard to count all the bugs get caught that way. Maybe once the heterogeneous deployment situation gets figured out we could do something similar with a particular project... peace, --michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Syntax-highlighting JS CSS code editor gadget embedding Ace
Very cool. Especially given the development trajectory of Ace to become the eclipse of web IDEs there will be a lot of interesting possibilities as we could develop our own mediaWiki centric plugins for the platform. I can't help but think about where this is ideally headed ;) A gitorius type system for easy branching with mediaWiki.org code review style tools, with in browser editing. With seemless workflows for going from per user developing and testing on the live site, to commits to your personal repository, to being reviewed and tested by other developers, to being enabled by interested users, to being enabled by default if so desired. A lot of these workflows could be prototyped without many complicated infrastructure improvements. Since this basic process is already happening in a round about way ... ( sometimes in a round about broken way ) A developer gadget could include a simple system for switching between local checkout of the scripts and support pushing a particular local copy live or in the case of the online Ace editor, bootstrapping a particular page with the state of your script ( using the draft extension concept ) so we don't have to save every edit when you want to test your code. We could specify a path structure within our existing svn to keep in sync with all gadgets and site scripts, then have our 'developer gadget' understand that path structure so you could seamlessly switch between the local and live gadget. ( I was manually doing something similar in my gadget development ). This could also help encourage gadget centralisation. We could then also link into the code review system for every site script and gadget with one click import of a particular version of the script ( ideally once the script has been seen by other developers ). Svn commits would not nessisarally be automatically be pushed to the wiki but edits to the wiki page would always be pushed to the svn. Or maybe a sign off in code review results in the push from svn to wiki, but would not want to slow down fixes getting pushed out. We would have to see what workflows work best for the community. mmm ... this would would probably work better with git :P ... but certainly is not a show stopper to experimentation in improving these workflows. peace, --michael On 04/12/2011 07:40 PM, Brion Vibber wrote: While pondering some directions for rapid prototyping of new UI stuff, I found myself lamenting the difficulty of editing JS and CSS code for user/site scripts and gadgets: * lots of little things to separately click and edit for gadgets * no syntax highlighting in the edit box * no indication of obvious syntax errors, leading to frequent edit-preview cycles (especially if you have to turn the gadget back off to edit successfully!) * no automatic indentation! * can't use the tab key Naturally, I thought it might be wise to start doing something about it. I've made a small gadget script which hooks into editing of JS and CSS pages, and embeds the ACE code editor (http://ace.ajax.org -- a component of the Cloud9 IDE, formerly Skywriter formerly Mozilla Bespin). This doesn't fix the usability issues in Special:Gadgets, but it's a heck of a lot more pleasant to edit the gadget's JS and CSS once you get there. :) The gadget is available on www.mediawiki.org on the 'Gadgets' tab of preferences. Note that I'm currently loading the ACE JavaScript from toolserver.org, so you may see a mixed-mode content warning if you're editing via secure.wikimedia.org. (Probably an easy fix.) Go try it out! http://www.mediawiki.org/wiki/MediaWiki:Gadget-CodeEditor.js IE 8 kind of explodes and I haven't had a chance to test IE9 yet, but it seems pretty consistently nice on current Firefox and Chrome and (barring some cut-n-paste troubles) Opera. I'd really love to be able to use more content-specific editing tools like this, and using Gadgets is a good way to make this sort of tool available for testing in a real environment -- especially once we devise some ways to share gadgets across all sites more easily. I'll be similarly Gadget-izing the SVG-Edit widget that I've previously done as an extension so folks can play with it while it's still experimental, but we'll want to integrate them better as time goes on. -- brion ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] writing phpunit tests for extensions
I had a bit of a documentation challenge approaching the problem of writing phpunit test for extensions, mostly because many of the extensions do this very differently and the manual did not have any recommendations. It appears many extension have custom bootstraping code ( somewhat hacky path discovery and manual loading of core mediawiki files, and don't necessarily register their tests in a consistent way.) I wrote up a short paragraph of what I would recommend here: http://www.mediawiki.org/wiki/Manual:Unit_testing#Writing_Unit_Test_for_Extensions If that makes sense, I will try and open up some bugs on the extensions with custom bootstraping code, and I would recommend we commit an example test tests/phpunit/suite.extension.xml file for exclusively running extension tests. Eventually it would be ideal to be able to 'just test your extension' from the core bootstraper (ie dynamically generate our suite.xml and namespace the registration of extension tests) ... but for now at least not having to wait for all the core tests as you write you extension tests and some basic documentation on how to do seems like a step forward. --michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] writing phpunit tests for extensions
On 04/04/2011 02:20 PM, Platonides wrote: Michael Dale wrote: Eventually it would be ideal to be able to 'just test your extension' from the core bootstraper (ie dynamically generate our suite.xml and namespace the registration of extension tests) ... but for now at least not having to wait for all the core tests as you write you extension tests and some basic documentation on how to do seems like a step forward. --michael If your tests are in just one file, you can simply pass it as a parameter to tests/phpunit/phpunit.php that's cool. We should add that info to the phpunit.php --help output, and to the unit testing wiki page. --michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Focus on sister projects
On 04/02/2011 04:08 PM, Ryan Kaldari wrote: 2. Creating The Complete Idiot's Guide to Writing MediaWiki Extensions and The Complete Idiot's Guide to Writing MediaWiki Gadgets (in jQuery) +1 ... Beyond the guide we could win a lot by centralising some of the scripts and libraries on mediawiki.org and establishing best practices for things like gadget localisation. --michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Future: Love for the sister projects!
On 04/03/2011 10:56 AM, Brion Vibber wrote: Harder, but very interesting in the medium to long-term: We would do good to survey and analyse other gadget, widget, add-on systems and communities that exist in web platforms. Not to say that wikipedias needs are the same, just that there are probably a lot of ideas to borrow from, and a good gadgets plan will have a few phases of implementation. Some anecdotal notes: * Gadget or widget have to get per user permission confirmation before it can take certain actions on your behalf. If we have an iframe postMessage api proxy bridge, we enable permissions of an open sand-boxed wiki gadget site we could potentially lower the criteria for entry and be a bit better than including random JS on your userpage .js page. * There is very fluid search and browsing interfaces to find, 'install' and share gadgets / add-ons. This includes things like ratings, usage statics, 'share this', author information etc. ** Visibility. Many editors and viewers probably have no idea gadgets exist. With the exception of projects globally enabling a gadget, many feature are pretty much hidden from users. Its sort of a chikin and egg issue but in addition to highlighting content, the sites main pages, may also highlight good usage of in-site tools and features. Like commons featuring a densly annotated image to highlight the image annotator, or the community portal of wikipedia directly linking to an interactive js gadget that enables a particular patrol work flow or article assessment task. The withJS system is kind of a hack for direct link into a gadget feature which I have used a lot, but a more formal easy opt-in mechanism would be nice... * Check out https://addons.mozilla.org/en-US/developers/ We have decent documentation for extensions and core mediawiki development, but the gadget effort is somewhat ad-hock, not very centralised and best practices are not very well documented ( although recent efforts are a step in the right direction :) peace, michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Where exactly are the video and audio players at?
On 03/24/2011 04:45 AM, Joseph Roberts wrote: Actually, looking through OggHandler, I do think that developing a seperate entity may work well. I'm not quite sure what is wanted by the general public and would like to do what is wanted by the majority, not just wat would be easiest or even the best. What would be the best way to implement a HTML5 player in MediaWiki? TIA - Joseph Roberts There is the Extension:TimedMediaHandler, that implements multi-format multi-bitrate transocding with auto source selection, html5 player interface, timed text, temporal media fragments, gallery and search pop-up players, viral iframe sharing / embedding, etc. Demo page here: http://prototype.wikimedia.org/timedmedia/Main_Page peace, michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] ie9 webm components
http://blogs.msdn.com/b/ie/archive/2011/03/16/html5-video-update-webm-for-ie9.aspx It appears to be a little rough around the edges, but should bode well for Wikimedia video support as IE 9 starts to be pushed out to windows machines and ideally we won't have ie7 and 8 for as long as we have had IE6 ;) If you have not already seen it, the TimedMediaHandler extension supports transcoding to both webm and ogg for mediawiki video assets: http://prototype.wikimedia.org/timedmedia/Main_Page I will integrate links to http://tools.google.com/dlpage/webmmf/ for IE9 users in the mwEmbed player once the components are working a bit more smoothly. peace, --michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] use jQuery.ajax in mw.loader.load when load script
On 02/18/2011 01:01 PM, Roan Kattouw wrote: 2011/2/18 Philip Tzou philip@gmail.com: jQuery's ajax method provides a better way to load a javascript, and it can detect when the script would be loaded and excute the callback function. I think we can implement it to our mw.loader.load. jQuery.ajax provides two way (ajax or inject) to load a javascript, you should set cache=true to use the inject one. I guess we could use this when loading stuff from arbitrary URLs in the future, but for normal module loads the mediaWiki.loader.implement() call in the server output works fine. Client side there is the mediaWiki.loader.using call which allows you to supply a callback, unfortunately there are some bugs in debug mode output and implement gets called before the scripts are actually ready, but it should work for production mode. --michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] File licensing information support
On 01/22/2011 01:15 PM, Bryan Tong Minh wrote: Handling metadata separately from wikitext provides two main advantages: it is much more user friendly, and it allows us to properly validate and parse data. This assumes wikitext is simply a formatting language, really its a data storage, structure and presentation language. You can already see this in place by the evolution of templates as both data and presentation containers. It seems like a bad idea to move away from leveraging flexible data properties used in presentation. In commons for we have Template:Information that links out into numerous data triples for assets presentation. ( ie Template:Artwork, Template:Creator, Template:Book with sub data relationships like Artwork.Location referencing the Institution template. If tied to SMW backed you could say give me artwork in room Pavillion de Beauvais at the louvre, that is missing a created on date. We should focus on apis for template editing, Extension:Page_Object_Model seemed like a step in the right direction but not Something that let you edit structured data across nested template objects and we could stack validation ontop of that would let us leverage everything that has been done and keep things wide open for what's done in the future. Most importantly we need clean high level apis that we can build GUIs on, so that the flexibility of the system does not hurt usability and functionality. Having a clear separate input text field Author: is much more user friendly {{#fileauthor:}}, which is so to say, a type of obscure MediaWiki jargon. I know that we could probably hide it behind a template, but that is still not as friendly as a separate field. I keep on hearing that especially for newbies, a big blob of wikitext is plain scary. We regulars may be able to quickly parse the structure in {{Information}}, but for newbies this is certainly not so clear. We actually see that from the community there is a demand for separating the meta data from the wikitext -- this is after all why they implemented the uselang= hacked upload form with a separate text box for every meta field. I don't know... see all the templates mentioned above... To be sure, I think we need better interfaces for interacting with templates. Also, a separate field allows MediaWiki to understand what a certain input really means. {{#fileauthor:[[User:Bryan]]}} means nothing to MediaWiki or re-users, but Author: Bryan___ [checkbox] This is a Commons username can be parsed by MediaWiki to mean something. It also allows us to mass change for example the author. If I want to change my attribution from Bryan to Bryan Tong Minh, I would need to edit the wikitext of every single upload, whereas in the new system I go to Special:AuthorManager and change the attribution. A semantic mediwiki like system retains this meaning for mediawiki to interact with at any stage of data [re]presentation, and of course supports flexible meaning types. Similar to categories, and all otheruser edited metadata. Categories is a good example of why metadata does not belong in the wikitext. If you have ever tried renaming a category... you need to edit every page in the category and rename it in the wikitext. Commons is running multiple bots to handle category rename requests. All these advantage outweigh the pain of migration (which could presumably be handled by bots) in my opinion. Unless your category was template driven, in which case you just update the template ;) If your category was instead magically associated with the page outside of template built wiki page text, how do you build procedurally build data associations? --michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Farewell JSMin, Hello JavaScriptDistiller!
On 01/21/2011 08:21 AM, Chad wrote: While I happen to think the licensing issue is rather bogus and doesn't really affect us, I'm glad to see it resolved. It outperforms our current solution and keeps the same behavior. Plus as a bonus, the vertical line smushing is configurable so if we want to argue about \n a year from now, we can :) Ideally we will be using closures by then and since it rewrites functions, variable names and sometimes collapses multi-line functionality, new line preservation will be a mute point. Furthermore, Google even has a nice add-on to firebug [1] for source code mapping. Making the dead horse even more dead. I feel like we are suck back in time, arguing about optimising code that came out eons ago in net time ( more than 7 years ago ) There are more modern solutions that take into consideration these concerns and do a better job at it. ( ie not just a readable line but a pointer back to the line of source code that is of concern ) [1] http://code.google.com/closure/compiler/docs/inspector.html peace, --michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] File licensing information support
On 01/21/2011 02:45 AM, Alex Brollo wrote: The interest of wikisource project for a formal and standardyzed set of book metadata (I presume from Dublin Core) into a database table is obviuos. Some preliminary tests into it.source suggest that templates and Labeled Section Transclusion extension could have a role as existing wikitext conteiners for semantized variables; the latter perhaps more interesting than the former one, since their content can be accessed directly from any page I'd like that book metadata would be considered from the beginning of this interesting project. Alex This quickly dove tails into Semantic MediaWiki discussion... which there are other threads on this list to reference. There is a wiki data summit / meeting coming up, where these issues will likely be discussed. Maybe we could start eliciting requirements and needs of projects like what you describe for wikisource and others that have been listed elsewhere on a pre-meeting project page, this way we can be sure to hit on all these items during the meeting. --michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] File licensing information support
On 01/20/2011 05:00 PM, Platonides wrote: I would have probably gone by the page_props route, passing the metadata from the wikitext to the tables via a parser function. I would also say its probably best to pass metadata from the wikitext to the tables via a parser function. Similar to categories, and all other user edited metadata. This has the disadvantage that its not easy 'as easy' to edit via structured api entry point, but has the advantage of working well with all the existing tools, templates and versioning. --michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Farewell JSMin, Hello JavaScriptDistiller!
As mentioned in the bug, it would be nice to have configurable support for the closure-compiler as well ;) ( I assume Apache licence is compatible? ) Has anyone done any tests to see if there are any compatibility issues with SIMPLE_OPTIMIZATIONS with a google closure minification hook? --michael On 01/20/2011 04:13 PM, Trevor Parscal wrote: For those of you who didn't see bug 26791, our use of JSMin has been found to conflict with our GPL license. After assessing other options ( https://bugzilla.wikimedia.org/show_bug.cgi?id=26791#c8 ) Roan and I decided to try and use the minification from JavaScriptPacker, but not its overly clever but generally useless packing techniques. The result is a minifier that outperforms our current minifier in both how quickly it can minify data and how small the minified output is. JavaScriptDistiller, as I sort of randomly named it, minifies JavaScript code at about 2x the speed of Tim's optimized version of JSMin, and 4x the speed of the next fastest PHP port of JSMin (which is generally considered the standard distribution). Similar to Tim's modified version of JSMin, we chose to retain vertical whitespace by default. However we chose not to retain multiple consecutive empty new lines, which are primarily seen where a large comment block has been removed. We feel there is merit to the argument that appx. 1% bloat is a reasonable price to pay for making it easier to read production code, since leaving each statement on a line by itself improves readability and users will be more likely to be able to report problems that are actionable. We do not however find the preservation of line numbers of any value, since in production mode most requests are for many modules which are concatenated, making line numbers for most of the code useless anyways. This is a breakdown based on ext.vector.simpleSearch * 3217 bytes (1300 compressed) * 2178 bytes (944) after running it through the version of JSMin that was in our repository. Tim modified JSMin to be faster and preserve line numbers by leaving behind all vertical whitespace. * 2160 bytes (938 compressed) after running it through JavaScriptDistiller, which applies aggressive horizontal minification plus collapsing multiple consecutive new lines into a single new line. * 2077 bytes (923 compressed) after running it through JavaScriptDistiller with the vertical space option set to true, which applies aggressive horizontal minification as well as some basic vertical minification. This option is activated through $wgResourceLoaderMinifyJSVerticalSpace, which is false by default. The code was committed in r80656. - Trevor (and Roan) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] JavaScript access to uploaded file contents: SVGEdit gadget needs ApiSVGProxy or CORS
On 01/03/2011 02:22 PM, Brion Vibber wrote: Since ApiSVGProxy serves SVG files directly out on the local domain as their regular content type, it potentially has some of the same safety concerns as img_auth.php and local hosting of upload files. If that's a concern preventing rollout, would alternatives such as wrapping the file data metadata into a JSON structure be acceptable? hmm... Is img_auth widely used? Can we just disable svg api data access if $wgUploadPath includes imageAuth ... or add a configuration variable that states if img_auth is an active entry point? Why dont we think about the problem diffrently and support serving images through the api instead of maintaining a speperate img_auth entry point? Is the idea that our asset scrubbing for malicious scripts or embed image html tags to protect against IE's lovely 'auto mime' content type is buggy? I think the majority of mediaWiki installations are serving assets on the same domain as the content. So we would do good to address that security concern as our own. ( afaiak we already address this pretty well) Furthermore we don't want people to have to re-scrub once they do access that svg data on the local domain... It would be nice to serve up diffrent content types data over the api in a number of use cases. For example we could have a more structured thumb.php entry point or serve up video thumbnails at requested times and resolutions. This could also clean up Neil's upload wizard per-user temporary image store by requesting these assets through the api instead of relying on obfuscation of the url. Likewise the add media wizard presently does two requests once it opens the larger version of the image. Eventually it would be nice to make more services available like svg localisation / variable substitution and rasterization. ( ie give me engine_figure2.svg in Spanish at 600px wide as a png ) It may hurt caching to serve everything over jsonp since we can't set smaxage with callback=randomString urls. If its just for editing its not a big deal, untill some IE svg viewer hack starts getting all svg over jsonp ;) ... Would be best if we could access this data without varying urls. Alternately, we could look at using HTTP access control headers on upload.wikimedia.org, to allow XMLHTTPRequest in newer browsers to make unauthenticated requests to upload.wikimedia.org and return data directly: https://developer.mozilla.org/En/HTTP_Access_Control I vote yes! ... This would also untaint video canvas data that I am making more and more use of in the sequencer ... Likewise we could add a crossdomain.xml file so IE flash svg viewers can access the data. In the meantime I'll probably work around it with an SVG-to-JSONP proxy on toolserver for the gadget, which should get things working while we sort it out. Sounds reasonable :) We should be able to upload the result via the api on the same domain as the editor so would be very fun to enable this for quick svg edits :) peace, --michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] JavaScript access to uploaded file contents: SVGEdit gadget needs ApiSVGProxy or CORS
On 01/04/2011 09:57 AM, Roan Kattouw wrote: The separate img_auth.php entry point is needed on wikis where reading is restricted (private wiis), and img_auth.php will check for read permissions before it outputs the file. The difference between the proxy I wrote and img_auth.php is that img_auth.php just streams the file from the filesystem (which, on WMF, will hit NFS every time, which is bad) whereas ApiSVGProxy uses an HTTP request (which will hit the image Squids, which is good). So ... it would be good to think about moving things like img_auth.php and thumb.php over to an general purpose api media serving module no? This would help standardise how media serving is extended, reduce extra entry points and as you point out above let us use more uniformly proxy our back-end data access over HTTP to hit the squids instead of NFS where possible. And as a shout out to Trevors mediawiki 2.0 vission, eventually enable more REST like interfaces within mediaWiki media handing. --michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Wikimedia Storage System ( was JavaScript access to uploaded...)
On 01/04/2011 01:12 PM, Neil Kandalgaonkar wrote: We've narrowed it down to two systems that are being tested right now, MogileFS and OpenStack. OpenStack has more built-in stuff to support authentication. MogileFS is used in many systems that have an authentication layer, but it seems you have to build more of it from scratch. Authentication is really a nice-to-have for Commons or Wikipedia right now. I anticipate it being useful for a handful of cases, which are both more anticipated than actual right now: - images uploaded but not published (a la UploadWizard) - forum avatars (which can viewed by anyone, but can only be edited by the user they belong to) hmm. I think it would ( obviously? ) be best to handle media authentication at the mediaWiki level with just a simple private / public accessible classification for the backed storage system. Things that are private have to go through the mediaWiki api where you can leverage all the existing extendible credential management. Also important to keep things simple for 3rd parties that are not using a clustered filesystem stack, easier to map web accessible dir vs not .. than any authentication managed within the storage system. Image 'editing' / uploading already includes basic authentication ie: http://www.mediawiki.org/wiki/Manual:Configuring_file_uploads#Upload_permissions User avatars would be a special case of I think thumbnail and transformation servers (they should also do stuff like rotating things on demand) are separate from how we store things, and will just be acting on behalf of the user anyway. So they don't introduce new requirements to image storage. Anybody see anything problematic about that? I think managing storage of procedural derivative assets differently than original files is pretty important. Probably one of the core features of a Wikimedia Storage system. Assuming finite storage it would be nice to specify we don't care as much if we lose thumbnails vs losing original assets. For example when doing 3rd party backups or dumpswe don't need all the derivatives to be included. We don't' need need to keep random resolutions derivatives of old revisions of assets around for ever, likewise improvements to SVG rasterization or improvements to transcoding software would mean expiring derivatives When mediaWiki is dealing with file maintenance it should have to authenticate differently when removing, moving, or overwriting orginals vs derivatives i.e independent of DB revision numbers or what mediaWiki *thinks* it should be doing. For example only upload ingestion nodes or modes should have write access to the archive store. Transcoding or thumbnailing or maintenance nodes or modes should only have read-only access to archive originals and write access to derivatives. As for things like SVG translation, I'm going to say that's out of scope and probably impractical. Our experience with the Upload Wizard Licensing Tutorial shows that it's pretty rare to be able to simply plug in new strings into an SVG and have an acceptable translation. It usually needs some layout adjustment, and for RTL languages it needs pretty radical changes. That said, it's an interesting frontier and it would be awesome to have a tool which made it easier to create translated SVGs or indicate that translations were related to each other. One thing at a time though. I don't think its that impractical ;) SVG includes some conventions for layout. With some procedural sugar could be improved, ie container sizes dictating relative character size. It may not be perfectly beautiful but certainly everyone translating content should not have to know how to edit SVG files, likewise software can facilitate a separate svg layout expert to come in later and improve on the automated derivative. But your correct its not part really part of storage considerations. But is part of thinking about the future of access to media streams via the api. Maybe the base thing for the storage platform to consider in this thread is: access to media streams via the api or if its going to try and manage a separate entry point outside of mediawiki. I think public assets going over the existing squid - http file server path and non-public asset going trough an api entry point would make sense. --michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] How would you disrupt Wikipedia?
Looking over the thread, there are lots of good ideas. Its really important to have some plan towards cleaning up abstractions between structured data, procedures in representation, visual representation and tools for participation. But, I think its correct to identify the social aspects of the projects as more critical than purity of abstractions within wikitext. Tools, bots and scripts and clever ui components can abstract away some of the pain of the underlining platform as long as people are willing to accept a bit of abstraction leakage / lack of coverage in some areas as part of moving to something better. One area that I did not see much mention of in this thread is automated systems for reputation. Reputation systems would be useful both for user interactions and for gauging expertise within particular knowledge domains. Social capital within wikikmedia projects is presently stored in incredibly unstructured ways and has little bearing on user privileges or how the actions of others are represented to you, and how your actions are represented to others. Its presently based on traditional small scale capacities of individuals to gauge social standing within their social networks and or to read user pages. We can see automatic reputation system emerging anytime you want to share anything online be it making a small loan to trading used DVDs. Sharing information should adopt some similar principals. There has been some good work done in this area with wikitrust system ( and other user moderation / karma systems ). Tying that data into smart interface flows that reward positive social behaviour and productive contributions, should make it more fun to participate in the projects and result in more fluid higher quality information sharing. peace, --michael On 12/29/2010 01:31 AM, Neil Kandalgaonkar wrote: I've been inspired by the discussion David Gerard and Brion Vibber kicked off, and I think they are headed in the right direction. But I just want to ask a separate, but related question. Let's imagine you wanted to start a rival to Wikipedia. Assume that you are motivated by money, and that venture capitalists promise you can be paid gazillions of dollars if you can do one, or many, of the following: 1 - Become a more attractive home to the WP editors. Get them to work on your content. 2 - Take the free content from WP, and use it in this new system. But make it much better, in a way Wikipedia can't match. 3 - Attract even more readers, or perhaps a niche group of super-passionate readers that you can use to build a new community. In other words, if you had no legacy, and just wanted to build something from zero, how would you go about creating an innovation that was disruptive to Wikipedia, in fact something that made Wikipedia look like Friendster or Myspace compared to Facebook? And there's a followup question to this -- but you're all smart people and can guess what it is. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] InlineEditor / Sentence-Level Editing: usability review
On 11/29/2010 07:56 AM, Roan Kattouw wrote: 2010/11/29 Jan Paul Posma jp.po...@gmail.com: Full interview videos will be available on Wikimedia Commons somewhere next month. They are in Dutch, though. Michael, can we subtitle those with mwEmbed magic? Roan Kattouw (Catrope) We can. We can even subtitle them with the slick universal subtitles interface ;) http://techblog.wikimedia.org/2010/10/video-labs-universal-subtitles-on-commons/ Hopefully by then I will have time to add a little language selection dialog at start-up, for now you would have to move the English subtitle into the do Dutch subtitle name after you complete your transcription. --michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Resource Loader problem
The code is not spread across many files.. its a single mw.Parser.js file. Its being used in my gadget and Neil's upload wizard. I agree the parser is not the ideal parser, its not feature complete, is not very optimised, and it was hacked together quickly. But it passes all the tests and matches the output of php for all the messages across all the languages. I should have time in the next few days to re merge / clean it up a bit if no one else is doing it. It should be clear who is doing what. The parser as is ... is more of a starting point than a finished project. But it starts by passing all the tests... If that useful we can plop it in there. an old version of the test file is here. I have a ported / slightly cleaner version in a patch http://prototype.wikimedia.org/s-9/extensions/JS2Support/tests/testLang.html it also includes a test file that confirms the transforms work across a sample set of messages. Its not clear to me how the current test files / system scales ... Mostly for Krinkle: The mediawiki.util.test.js seem to always include itself when in debug mode. And why does mediawiki.util.test.js not define an object by name mediawiki.util.test it instead defines mediawiki.test also: if (wgCanonicalSpecialPageName == 'Blankpage' mw.util.getParamValue('action') === 'mwutiltest') { Seems gadget like.. this logic can be done on php side no? Why not deliver specific test payloads for specific test entry points? if you imagine we have dozes of complicated tests systems with sub components the debug mode will become overloaded with js code that is never running. --michael On 11/10/2010 10:56 AM, Roan Kattouw wrote: 2010/11/10 Dmitriy Sintsov ques...@rambler.ru: * Trevor Parscal tpars...@wikimedia.org [Wed, 10 Nov 2010 00:16:27 -0800]: Well, we basically just need a template parser. Michael has one that seems to be working for him, but it would need to be cleaned up and integrated, as it's currently spread across multiple files and methods. Do you like writing parsers? Maybe my knowledge of MediaWiki is not good enough, but aren't the local messages only provide the basic syntax features like {{PLURAL:||}}, not a full Parser with template calls and substitutions? I never tried to put real template calls into messages. Rewriting the whole Parser in Javascript would be a lot of work. Many people have already failed to make alternative parsers fully compatible. And how would one call the server-side templates, via AJAX calls? That would be inefficient. We're not looking for a full-blown parser, just one that has a few basic features that we care about. The current JS parser only supports expansion of message parameters ($1, $2, ...), and we want {{PLURAL}} support too. AFAIK that's pretty much all we're gonna need. Michael Dale's implementation has $1 expansion and {{PLURAL}}, AFAIK, and maybe a few other features. I am currently trying to improve my Extension:WikiSync, also I have plans to make my another extensions ResourceLoader compatible. I think {{PLURAL}} is an important feature for ResourceLoader, and if no volunteer wants to implement it, I think a staff developer should. However, if the most of work has already been done, I can take a look, but I don't have the links to look at (branches, patches). I just don't know how much time would it take. Sorry. I believe most of the work has already been done, yes, but I've never seen Michael's code and I don't know where it is (maybe someone who does can post a link?). Roan Kattouw (Catrope) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Commons ZIP file upload for admins
On 10/25/2010 12:02 PM, Erik Moeller wrote: Hello all, for some types of resources, it's desirable to upload source files (whether it's Blender, COLLADA, Scribus, EDL, or some other format), so that others can more easily remix and process them. Currently, as far as I know, there's no way to upload these resources to Commons. What would be the arguments against allowing administrators to upload arbitrary ZIP files on Wikimedia Commons, allowing the Commons community to develop policy and process around when such archived resources are appropriate? An alternative, of course, would be to whitelist every possible source format for admins, but it seems to me that it would be a good general policy to not enable additional support for formats that aren't officially supported (reduces confusion among users about what's permitted -- there's only one file format they can't use). Thoughts? Thanks, Erik Its most ideal if we actually support these formats, so we can do thing like thumbnails, basic meta data etc. Failing that its better to support a given file extension, then it is to support zip files. This way if in 'the future' we add support for X file format, then we have X format files stored consistently so we can support representation of that file format. If we add blanket support for 'throw whatever you want' into a zip file, it will be difficult to give a quality representation of that asset in the future. ( other than as a zip file with multiple sub assets ). If for example someone writes a diff engine for representing 3d model transformations, we won't as easily be able to plug-in that tool, if we don't have a consistent storage model for that file format. That being said their may be some composite asset sets that lack container systems, in which case it would not be bad support some open container format. The number of formats or multimedia asset compositing systems that are not web representable with JavaScript engines or natively supported in the browser should be on a dramatic decline in the next decade, so best to just focus on support for such formats. For example we prefer svg uploads to a zip file with an illustrator assets, because svg is representable in the browser, there are javascript based engines for editing svg [http://svg-edit.googlecode.com/svn/branches/2.4/editor/svg-editor.html] etc. Likewise for 3d model representation with the COLLADA format, (although much more in its infancy at this point in time. ) --michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] ResourceLoader Debug Mode
This is getting a little out of hand? People are going to spend more time talking about the potential for minification errors or sub optimisation cost criteria then we will ever actually be running into real minification errors or any real readability issue. Reasonable efforts are being made to make the non-minified versions easily accessible. We can add additional comment to every outputted file that says request me with ?debug=true to get the source code version, add a user preference, add a cookie, and a view non-minfied source link to the bottom every page... But try to keep in mind the whole point of minification is identical program execution while minimising the file size! If we are to optimise with random adjustments for readability we are optimising in two opposite directions, and every enhacment in the direction of optimized pacakge code delivery could potentially go against the 'readability' optimisation. We are already commited to supporting two modes one that optimized readability with raw source code delivery and one that is optimised for small packaged delivery. No sense in setting readability as criteria for the packaged mode since the 'readability' mode will always do 'readability' better. --michael On 10/01/2010 12:38 AM, Trevor Parscal wrote: I was hardly making a case for how amazingly expensive it was. I was running some basic calculations that seemed to support your concept of fairly cheap, but chose to mention that it's still not free. - Trevor On 9/30/10 9:30 PM, Tim Starling wrote: On 01/10/10 04:35, Trevor Parscal wrote: OK, now I've calculated it... On a normal page view with the Vector skin and the Vector extension turned on there's a 2KB difference. On an edit page with the Vector skin and Vector and WikiEditor extensions there's a 4KB difference. While adding 2KB to a request for a person in a remote corner of the world on a 56k modem will only add about 0.3 seconds to the download, sending 2,048 extra bytes to 350 million people each month increases our bandwidth by about 668 gigabytes a month. We don't pay by volume (GB per month), we pay by bandwidth (megabits per second at the 95th percentile). They should be roughly proportional to each other, but to calculate a cost we have to convert that 668GB figure to a percentage of total volume. I took this graph: http://www.nedworks.org/~mark/reqstats/trafficstats-monthly.png And I used the GIMP histogram tool to integrate the outgoing part for 30 days between week 34 and week 37. The result was 31,824 pixels of blue and 20,301 pixels of green, which I figure is about 2113 TB/month. So on your figure, the cost of adding line breaks would be about 0.03% of whatever the bandwidth bill for that month is. I don't have that number to hand, but I suspect 0.03% of it is not going to be very much. For 2009-2010 there was a budget of about $1M for internet hosting, of which bandwidth is a part, and 0.03% of that entire budget category is only $25 per month. I think your 668GB figure is too low, because current uniques is more like 390M per month, and because some unique visitors will request the JS more than once. You can double it if you think it would help you make your case. I don't know what that kind of bandwidth costs the foundation, but it's not free. Developer time is not free either. -- Tim Starling ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] ResourceLoader, now in trunk!
On 09/09/2010 10:56 AM, Trevor Parscal wrote: https://lists.wikimedia.org/mailman/listinfo/wikitech-l We would need to vary on that cookie, but yes, this seems like a cool idea. - Trevor Previously when we had this conversation, I liked the idea of setting a user preference. http://lists.wikimedia.org/pipermail/wikitech-l/2010-May/047800.html This is easiset to setup with the existing setup, since we already have cache destroying things in the preferences anyway. --michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] ResourceLoader, now in trunk!
On 09/07/2010 01:39 AM, Tim Starling wrote: I think it's ironic that this style arises in JavaScript, given that it's a high-level language and relatively easy to understand, and that you could make a technical case in favour of terseness. C has an equally effective minification technique known as compilation, and its practitioners tend to be terse to a fault. For instance, many Linux kernel modules have no comments at all, except for the license header. I would quickly add that it would be best to promote JSDoc style comments ( as I have slowly been doing for mwEmbed ). This way eventually we could have ( Google Closure Compiler ) read these comments to better optimize JavaScript compilation: http://code.google.com/closure/compiler/docs/js-for-compiler.html Also we can create pretty JSDoc style documentation for people that are into that sort of thing. --michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] ResourceLoader, now in trunk!
On 09/08/2010 06:28 AM, Roan Kattouw wrote: I don't believe we should necessarily support retrieval of arbitrary wiki pages this way, but that's not needed for Gadgets: there's a gadgets definition list listing all available gadgets, so Gadgets can simply register each gadget as a module (presumably named something like gadget-gadgetname). This is of course already supported, just not in 'grouped requests'. Open up your scripts tab on a fresh load of http://commons.wikimedia.org/wiki/Main_Page Like 24 or so of the 36 scripts requests on commons are 'arbitrary wiki pages' requested as javascript: http://commons.wikimedia.org/w/index.php?title=MediaWiki:AjaxQuickDelete.jsaction=rawctype=text/javascript Not gadgets in the php extensions sense. Ie MediaWiki:Common.js does a lot of loading and the gadget set is not defined in php on the server. The resource loader should minimally let you group MediaWiki namespace javascript and css. --michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] ResourceLoader, now in trunk!
On 09/08/2010 11:25 AM, Roan Kattouw wrote: It's defined on a MediaWiki: page, which is accessed by the server to generate the Gadgets tab in Special:Preferences. There is sufficient server-side knowledge about gadgets to implement them as modules, although I guess we might as well save ourselves the trouble and load them as wiki pages, We should have an admin global enabled 'gadgets' enable system ( with support for turning it on per user group ie all users, admin users etc. Each gadget should define something like: MediaWiki:Gadget-loader-ImageAnnotator.js where it has the small bit that is presently stored in free text in MediaWiki:Common.js ie: / ImageAnnotator ** * Globally enabled per * http://commons.wikimedia.org/w/index.php?title=Commons:Village_pumpoldid=26818359#New_interface_feature * * Maintainer: [[User:Lupo]] / if (wgNamespaceNumber != -1 wgAction (wgAction == 'view' || wgAction == 'purge')) { // Not on Special pages, and only if viewing the page if (typeof (ImageAnnotator_disable) == 'undefined' || !ImageAnnotator_disable) { // Don't even import it if it's disabled. importScript ('MediaWiki:Gadget-ImageAnnotator.js'); } } Should go into the gadget loader file and of course instead of importScript, some loader call that aggregates all the loader load calls for a given page ready time. It should ideally also support some sort of grouping strategy parameter. We should say something like packages that are larger than 30k or used on a single page should not be grouped. As to avoid mangled cache effects escalating into problematic scenarios. As we briefly discussed, I agree with Trevor that if the script is small and more or less widely used its fine to retransmit the same package in different contexts to avoid extra requests on first visit. But it should be noted that separating requests can result in ~less~ requests. ie imagine grouping vs separate request where page 1 uses resource set A, B and page 2 uses resource set A, C then page 3 uses A,B,C you still end up doing 3 requests across the 3 page views, except with 'one request' strategy you resend A. The forth page that just uses B, C you can pull those from cache and do zero requests, or resend B, C if you always go with a 'single request'. Of course as you add more permutations like page 5 that uses just A just B or just C it can get ugly. Which is why we need to ~strongly recommend~ the less than 30K or rarely used javascript grouping rules somehow. The old resource loader had the concept of 'buckets' I believe the present resource loader just has an option to 'not-group', which is fine since 'buckets' could be conceptualized as 'meta modules sets' that are 'not-grouped'. Not sure whats conceptually more clear. IMHO buckets is a bit more friendly to modular extension gadget development since any module can say its part of a given group without modifying a master or core manifest. At any rate, we should make sure to promote either buckets or 'meta module' option, or it could result in painful excessive retransmission of grouped javascrpt code. --michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Storing data across requests
On 07/29/2010 10:15 AM, Bryan Tong Minh wrote: Hi, I have been working on getting asynchronous upload from url to work properly[1]. A problem that I encountered was that I need to store data across requests. Normally I would use $_SESSION, but this data should also be available to job runners, and $_SESSION isn't. Could the job not include the session_id and upload_session_key .. then in your job handling code you just connect into that session via session_id( $session_id ); session_start(); to update the values ? . That seems like it would be more lightweight than DB status updates. .. I see Platonides suggested this as well.. ( that is how it was done originally done but with a background php process rather than jobs table ) see http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/includes/HttpFunctions.php?view=markuppathrev=53825 line 145 ( doSessionIdDownload ) --michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Upload file size limit
On 07/20/2010 10:24 PM, Tim Starling wrote: The problem is just that increasing the limits in our main Squid and Apache pool would create DoS vulnerabilities, including the prospect of accidental DoS. We could offer this service via another domain name, with a specially-configured webserver, and a higher level of access control compared to ordinary upload to avoid DoS, but there is no support for that in MediaWiki. We could theoretically allow uploads of several gigabytes this way, which is about as large as we want files to be anyway. People with flaky internet connections would hit the problem of the lack of resuming, but it would work for some. yes in theory we could do that ... or we could support some simple chunk uploading protocol for which there is *already* basic support written, and will be supported in native js over time. The firefogg protocol is almost identical to the plupload protocol. The main difference is firefogg requests a unique upload parameter / url back from the server so that if you uploaded identical named files they would not mangle the chunking. From a quick look at upload.php of plupload it appears plupload relies on the filename and a extra chunk url parameter != 0 request parameter. The other difference is firefogg has an explicit done = 1 in the request parameter to signify the end of chunks. We requested feedback for adding a chunk id to the firefogg chunk protocol with each posted chunk to gard againt cases where the outer caches report an error but the backend got the file anyway. This way the backend can check the chunk index and not append the same chunk twice even if their are errors at other levels of the server response that cause the client to resend the same chunk. Either way, if Tim says that plupload chunk protocol is superior then why discuss it? We can easily shift the chunks api to that and *move forward* with supporting larger file uploads. Is that at all agreeable? peace, --michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [gsoc] splitting the img_metadata field into a new table
More important than file_metadata and page asset metadata working with the same db table backed, its important that you can query export all the properties in the same way. Within SMW you already have some special properties like pagelinks, langlinks, category properties etc, that are not stored the same as the other SMW page properties ... The SMW system should name-space all these file_metadata properties along with all the other structured data available and enable universal querying / RDF exporting all the structured wiki data. This way file_metadata would just be one more special data type with its own independent tables. ... SMW should abstract the data store so it works with the existing structured tables. I know this was already done for categories correct? Was enabling this for all the other links and usage tables explored? This also make sense from an architecture perspective, where file_metadata is tied to the file asset and SMW properties are tied to the asset wiki description page. This way you know you don't have to think about that subset of metadata properties on page updates since they are tied to the file asset not the wiki page propriety driven from structured user input. Likewise uploading a new version of the file would not touch the page data tables. --michael Markus Krötzsch wrote: Hi Bawolff, interesting project! I am currently preparing a light version of SMW that does something very similar, but using wiki-defined properties for adding metadata to normal pages (in essence, SMW is an extension to store and retrieve page metadata for properties defined in the wiki -- like XMP for MW pages; though our data model is not quite as sophisticated ;-). The use cases for this light version are just what you describe: simple retrieval (select) and basic inverse searches. The idea is to thus have a solid foundation for editing and viewing data, so that more complex functions like category intersections or arbitrary metadata conjunctive queries would be done on external servers based on some data dump. It would be great if the table you design could be used for such metadata as well. As you say, XMP already requires extensibility by design, so it might not be too much work to achieve this. SMW properties are usually identified by pages in the wiki (like categories), so page titles can be used to refer to them. This just requires that the meta_name field is long enough to hold MW page title names. Your meta_schema could be used to separate wiki properties from other XMP properties. SMW Light does not require nested structures, but they could be interesting for possible extensions (the full SMW does support one-level of nesting for making compound values). Two things about your design I did not completely understand (maybe just because I don't know much about XMP): (1) You use mediumblob for values. This excludes range searches for numerical image properties (Show all images of height 1000px or more) which do not seem to be overly costly if a suitable schema were used. If XMP has a typing scheme for property values anyway, then I guess one could find the numbers and simply put them in a table where the value field is a number. Is this use case out of scope for you, or do you think the cost of reading from two tables too high? One could also have an optional helper field meta_numvalue used for sorting/range-SELECT when it is known from the input that the values that are searched for are numbers. (2) Each row in your table specifies property (name and schema), type, and the additional meta_qualifies. Does this mean that one XMP property can have values of many different types and with different flags for meta_qualifies? Otherwise it seems like a lot of redundant data. Also, one could put stuff like type and qualifies into the mediumblob value field if they are closely tied together (I guess, when searching for some value, you implicitly specify what type the data you search for has, so it is not problematic to search for the value + type data at once). Maybe such considerations could simplify the table layout, and also make it less specific to XMP. But overall, I am quite excited to see this project progressing. Maybe we could have some more alignment between the projects later on (How about combining image metadata and custom wiki metadata about image pages in queries? :-) but for GSoC you should definitely focus on your core goals and solve this task as good as possible. Best regards, Markus On Freitag, 28. Mai 2010, bawolff wrote: Hi all, For those who don't know me, I'm one of the GSOC students this year. My mentor is ^demon, and my project is to enhance support for metadata in uploaded files. Similar to the recent thread on interwiki transclusions, I'd thought I'd ask for comments about what I propose to do. Currently metadata is stored in
Re: [Wikitech-l] Revisiting becoming an OpenID Provider
Robb Shecter wrote: Consider this true scenario: I want to write a MediaWiki API client for editors; something like the Wordpress Dashboard. Really give editors a modern web experience. I'd want to do this as a Rails app: I could build it quickly and find lots of collaborators via GitHub. Not to derail the open-id idea I think we should support oAuth 100% and it certainly would help with persistent applications and scalability... But ...for the most part you can build these types of applications in pure javascript. Anytime you need to run an api action that requires you to be on the target domain you call a bit of code to iframe proxy that action on the target domain and communicate its results to the client domain with another iframe back to the client. mwEmbed provides iFrame proxy as part of a uniform api request system with the mw.getJSON() function. This that lets you just call that function and mwEmbed works out if it needs to spawn a proxy or if it can make the request directly. Presently I hard-code the approved domains, but it would not be difficult to add in process where users could approve domains / applications. We could even do explicit approval for the set of allowable api actions being requested. ( ie edit pages OK upload NO ) This has been in use for a while and its how the uploading to commons from the English encyclopedia page works with the add-media-wizard gadget. http://bit.ly/9P144i You can test it by simply by enabling that gadget, then while editing click insert image, then the upload button, then upload to commons. ~Right now~ its a pure javascript gadget that is enabled on (en.wikipedia) which calls another gadget on ( commons.wikimedia ) and they setup two-way communication that way. To make things more complicated all the javascript and html proxy pages are hosted on a 3rd domain ( prototype.wikimedia.org ) and its not just simple api calls, rather its full file uploading proxy with progress indicators and two way error interactions. In the context of the mwEmbed gadget this is more complicated than it needs to be. I should package a apiProxy extension that could simplify things like having an actual proxy entry point that does not load the entire set of mediaWiki view page assets on every proxy interaction. Also it could use some HTML5 type enhancements around cross domain communication so the application could send and receive the msgs directly where the domain is approved and the browser supports it. Furthermore some versions of IE have to request user approval for the iFrame to carry user credentials, but this can be avoided with a p3p policy added to the response header. http://bit.ly/13kpV That being said it has worked oky for what I needed it for, and I think it could be used for prototyping the editors portal as you have described it. peace, --michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] js2 extensions / Update ( add-media-wizard, uploadWizard, timed media player )
Aryeh Gregor wrote: On Thu, May 20, 2010 at 3:20 PM, Michael Dale md...@wikimedia.org wrote: I like the idea of a user preference. This way you don't constantly have to add debug to the url, its easy to tests across pages and is more difficult for many people to accidentally invoke it. It also means that JavaScript developers will be consistently getting served different code from normal users, and therefore will see different sets of bugs. I'm unenthusiastic about that prospect. hmm ... if the minification resulted in different bugs than the raw code that would be a bug with the minification process and you will want to fix that minfication bug. You will want to know where the error occurred in the minfied code. Preserving new-lines is a marginal error accessibility gain when your grouping many scripts, replacing all the comments with new lines, striping debug lines, and potentially shortening local scope variable names. Once you are going to fix an issue you will be fixing it in the actual code not the minified output, so you will need to recreate the bug with the non-minified output. In terms of all-things-being-equal compression wise using \n instead of semicolons consider: a = b + c; (d + e).print(); With \n instead of ; it will be evaluated as: a = b + c(d + e).print(); We make wide use of parenthetical modularization, i.e all the jquery plugins do soemthing like: (function($){ /* plugin code in local function scope using $ for jQuery */})(jQuery); Initialization code above line with /\n/, ';' substitution will result in errors. The use of a script-debug preference is for user-scripts development that is hosted live and developed on the server wiki pages. Development of core and extension javascript components should be tested locally in both minified and non-minified modes. peace, --michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] js2 extensions / Update ( add-media-wizard, uploadWizard, timed media player )
Helder Geovane wrote: I would support a url flag to avoid minification and or avoid script-grouping, as suggested by Michael Dale, or even to have a user preference for enable/disable minification in a more permanent way (so we don't need to change the url on each test: we just disable minification, debug the code and then enable it again) I like the idea of a user preference. This way you don't constantly have to add debug to the url, its easy to tests across pages and is more difficult for many people to accidentally invoke it. Committed support for the preference in r66703 --michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] js2 extensions / Update ( add-media-wizard, uploadWizard, timed media player )
The script-loader has a few modes of operation. You can run it in raw file mode ( ie $wgEnableScriptLoader = false ). This will load all your javascript files directly only doing php requests to get the messages. In this mode php does not touch any js or css file your developing. Once ready for production you can enable the script-loader it groups, localizes, removes debug statements, transforms css url paths, minifies the set of javascript / css. It includes experimental support for google closure compiler which does much more aggressive transformations. I think your misunderstanding the point of the script-loader. Existing extensions used on wikipedia already do a static package and minify javascript code. If you want to have remote user communicate javascript debugging info, we could add a url flag to avoid minification and or avoid script-grouping. Maybe be useful in the case of user-scripts / gadgets. But in general its probably better / easier for end users to just identify their platform and whats not working, since its all code to them anyway. If they are a developer or are going to do something productive with what they are seeking they likely have the code checked out locally and use the debug mode. --michael Aryeh Gregor wrote: On Mon, May 17, 2010 at 6:43 PM, Maciej Jaros e...@wp.pl wrote: So does this extensions encrypt JS files into being non-debugable? I could understand that on sites like Facebook but on an open or even Open site like Wikipedia/Mediawiki? This just seems to be wrong. Simple concatenation of files would serve the same purpose in terms of requests to the server. At the very least, newlines should be preserved, so you can get a line number when an error occurs. Stripping other whitespace and comments is probably actually be worth the performance gain, from what I've heard, annoying though it may occasionally be. Stripping newlines is surely not worth the added debugging pain, on the other hand. (Couldn't you even make up for it by stripping semicolons?) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] js2 extensions / Update ( add-media-wizard, uploadWizard, timed media player )
If you have been following the svn commits you may have noticed a bit of activity on the js2 front. I wanted to send a quick heads up that describes what is going on and invite people to try things out, and give feedback. == Demos == The js2 extension and associated extension are ruining on sandbox-9. If you view the source of a main page you can see all the scripts and css and grouped into associated buckets: http://prototype.wikimedia.org/sandbox.9/Main_Page I did a (quick) port of usabilityInitiative to use the script-loader as well. Notice if you click edit on a section you get all the css and javascript, localized in your language and delivered in a single request. ( I don't include the save / publish button since it was just a quick port ) Part of the js2 work included a wiki-text parser for javascript client side message transformation: http://prototype.wikimedia.org/s-9/extensions/JS2Support/tests/testLang.html There are a few cases out of the 356 tests were I think character encoding is not letting identical messages pass the test and a few transformations that don't match up. I will take a look at those edge cases soon. The Multimedia initiative ( Neil and Guillaume's ) UploadWizard is a js2 / mwEmbed based extension and also enabled on in that wiki as well: http://prototype.wikimedia.org/sandbox.9/Special:UploadWizard The js2 branch of the OggHandler includes Transcode support ( so embed web resolution oggs when embed at web resolution in pages ) This avoids 720P ogg videos displayed at 240 pixels wide inline ;) http://prototype.wikimedia.org/sandbox.9/Transcode_Test The TimedMediaHandler of course include timed text display support which has been seen on commons for a while http://bit.ly/aLo1pZ ... Subtitles get looked up from commons when the repo is shared:: http://prototype.wikimedia.org/sandbox.9/File:Welcome_to_globallives_2.0.ogv I have been working with the miro universal subtitles efforts so we should have an easy interface for people to contribute subtitles with soon Edits pages of course include the add-media-wizard which as been seen as a remote http://bit.ly/9P144i for some time also now also works as an extension == Documentation == Some initial JS2 extension is in extensions/JS2Support/README Feedback on that documentation would also be helpful. --michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] JS2 code live? (was: Uploads on small wikis)
withJS is a special parameter in MediaWiki:Common.js that lets people preview mediaWiki namespace user scripts. Its been on commons for ages and en.wikipedia for a few weeks. peace, --michael Chad wrote: On Fri, Mar 12, 2010 at 1:12 PM, Michael Dale md...@wikimedia.org wrote: Guillaume Paumier wrote: Just FYI, we're working on both (crosswiki-upload and 1-click crosswiki file move), but we're not quite there yet. As mentioned on commons list a cross site upload tool is in early / alpha / experimental testing: http://lists.wikimedia.org/pipermail/commons-l/2010-March/005335.html To summarize from that post you can visit: http://en.wikipedia.org/w/index.php?title=Wikipedia:Sandboxaction=editwithJS=MediaWiki:MwEmbed.js I haven't seen this new withJS parameter anywhere in trunk or wmf-deployment, only in js2-work, unless I'm being really dense today. When did this go live? -Chad ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] New committers
Should we define some sort of convention for extensions to develop selenium tests? i.e perhaps some of the tests should live in a testing folder of each extension or in /trunk/testing/extensionName ? So that its easier to modularize the testing suite? --michael Roan Kattouw wrote: bhagya - QA engineer for Calcey Technologies, will be committing Selenium tests to /trunk/testing/selenium janesh - same Roan Kattouw (Catrope) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] New Subversion committers
Also just added Michael Shynar ( shmichael ) from Kaltura who is doing some add-media-wizard work. --michael Tim Starling wrote: Bawolff: various Wikinews-related extensions Jonathan Williford: extensions developed for http://neurov.is/on Ning Hu: Semantic NotifyMe Rob Lanphier and Conrad Irwin have been added to the core committer group. -- Tim Starling ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] mwEmbed gadget updates and improved subtitle support
Wanted to do a quick mention of the updated mwEmbed gadget and subtitles support on the email list(s) in case people have missed its mention on the village pump [1] or other venues. * On commons the Commons:Timed_Text and Template:Closed_cap have been started. * oggHandler has been patched to support itext [2], [3] output so that it would not need to hit the api to get the list of available subtitles when we are embedding locally. Remote embedding outside of wikimedia domain grabs the up-to-date list of available tracks via an api call. * A basic Add timed text interface is accessible from the cc button letting you upload an srt file. * If you enable the mwEmbed gadget on English wikipedia ( or other wikipedias ) it will display commons subtitles in the respective wgUserLanguage. ( Interface translations are temporarly dissabled per bug 21947 which should be resolvable as soon as the release is branched. * Even with extremely limited exposure the number of subtitle files is starting to grow: [4] * The timedText display supports inline wiki-text so persons names, subjects and place of interest can be linked to their respective wikpedia articles in their respective languages from the srt text. [5] * Gadget feedback has been very good. Big thanks to all that have helped test and especially User:84user with his very detailed reports ;) And finally .. I imagine it ~may~ take some time before the mwEmbed stuff makes its way through code review and on by default deployment because of release branching and mwEmbed includes quite a few other components ... But similar to usability beta, and other gadgets we could let people do a much easier opt in. Which I will continue to look into in the mean time. peace, michael [1] http://commons.wikimedia.org/wiki/Commons:Village_pump#Improved_Close_Captions_Support [2] http://www.annodex.net/~silvia/itext/ [3] http://www.mediawiki.org/wiki/Special:Code/MediaWiki/60458 [4] http://commons.wikimedia.org/w/index.php?title=Special%3AAllPagesfrom=to=namespace=102 [5] http://commons.wikimedia.org/wiki/File:Yochai_Benkler_-_On_Autonomy,_Control_and_Cultureal_Experience.ogg ** ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] JS2 design (was Re: Working towards branching MediaWiki 1.16)
... that makes sense .. ( on the side I was looking into a fall-back ogg video serving solution that would hit the disk issue) .. but in this context your right .. its about saturating the network port Since network ports are generally pretty fast, a test on my laptop may be helpful: (running PHP 5.2.6-3ubuntu4.2 Apache/2.2.11 Intel Centrino 2Ghz ) Lets take a big script-loader request running from memory say the firefogg advanced encoder javascript set (from the trunk...I made the small modifications Tim suggested ie (don't parse the javascript file to get the class list) #ab -n 1000 -c 100 http://localhost/wiki_trunk/js2/mwEmbed/jsScriptLoader.php?urid=18class=mv_embed,window.jQuery,mvBaseUploadInterface,mvFirefogg,mvAdvFirefogg,$j.ui,$j.ui.progressbar,$j.ui.dialog,$j.cookie,$j.ui.accordion,$j.ui.slider,$j.ui.datepicker; result is: Concurrency Level: 100 Time taken for tests: 1.134 seconds Complete requests: 1000 Failed requests:0 Write errors: 0 Total transferred: 64019000 bytes HTML transferred: 63787000 bytes Requests per second:881.54 [#/sec] (mean) Time per request: 113.437 [ms] (mean) Time per request: 1.134 [ms] (mean, across all concurrent requests) Transfer rate: 55112.78 [Kbytes/sec] received So we are hitting near 900 request per second on my 2 year old laptop. Now if we take the static minified combined file which is 239906 instead of 64019 bytes we should of-course get much higher RPS going direct to apache: #ab -n 1000 -c 100 http://localhost/static_combined.js Concurrency Level: 100 Time taken for tests: 0.604 seconds Complete requests: 1000 Failed requests:0 Write errors: 0 Total transferred: 240385812 bytes HTML transferred: 240073188 bytes Requests per second:1655.18 [#/sec] (mean) Time per request: 60.416 [ms] (mean) Time per request: 0.604 [ms] (mean, across all concurrent requests) Transfer rate: 388556.37 [Kbytes/sec] received Here we get near 400MBS and around 2x times the Request per second... At a cost of about 1/2 as many requests you can send the content to people 3 times as small (ie faster). Of course none of this applies to wikimedia setup where these would all be squid proxy hits. \ I hope this shows that we don't necessarily have to point clients to static files, and that php pre-processing the cache is not quite as costly as Tim outlined (if we setup an entry point that first checks the disk cache before loading in all of mediaWiki php ) Additionally most mediaWiki installs out there are probably not serving up thousands of request per second. (and those that are are probably have proxies setup).. So the gziping php proxy of js requests is worth while. --michael Aryeh Gregor wrote: On Wed, Sep 30, 2009 at 3:32 PM, Michael Dale md...@wikimedia.org wrote: Has anyone done any scalability studies into minimal php @readfile script vs apache serving the file. Obviously apache will server the file a lot faster but a question I have is at what file size does it saturate disk reads as opposed to saturated CPU? It will never be disk-bound unless the site is tiny and/or has too little RAM. The files can be expected to remain in the page cache perpetually as long as there's a constant stream of requests coming in. If the site is tiny, performance isn't a big issue (at least not for the site operators). If the server has so little free RAM that a file that's being read every few minutes and is under a megabyte in size is consistently evicted from the cache, then you have bigger problems to worry about. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] JS2 design (was Re: Working towards branching MediaWiki 1.16)
Aryeh Gregor wrote: Also remember the possibility that sysops will want to include these scripts (conditionally or unconditionally) from MediaWiki:Common.js or such. Look at the top of http://en.wikipedia.org/wiki/MediaWiki:Common.js, which imports specific scripts only on edit/preview/upload; only on watchlist view; only for sysops; only for IE6; and possibly others. It also imports Wikiminiatlas unconditionally, it seems. I don't see offhand how sysop-created server-side conditional includes could be handled, but it's worth considering at least unconditional includes, since sysops might want to split code across multiple pages for ease of editing. This highlights the complexity of managing all javascript dependences on the server side... If possible the script-loader should dynamically handle these requests. For wikimedia its behind a squid proxy so should not be too bad. For small wikis we could setup a dedicated entry point that could first check the file cache key before loading all webstart.php, parsing javascript classes and all the other costly mediaWIki web engine stuff. Has anyone done any scalability studies into minimal php @readfile script vs apache serving the file. Obviously apache will server the file a lot faster but a question I have is at what file size does it saturate disk reads as opposed to saturated CPU? --michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] js2 coding style for html output
My attachment did not make it into the JS2 design thread... and that thread is in summary mode so here is a new post around the html output question. Which of the following constructions are easier to read and understand. Is there some tab delimitation format we should use to make the jquery builder format easier? Are performance considerations relevant? (email is probably a bad context for comparison since tabs will get messy and there is no syntax highlighting) Tim suggested that in security review context dojBuild type html output is more strait forward to review. I think both are useful and I like jquery style building of html since it gives you direct syntax errors rather than html parse errors which are not as predictable across browsers. But sometimes performance wise or from a quick get it working perspective its easier to write out an html string. Also I think tabbed html is a bit easier on the eyes for someone that has dealt a lot with html. Something thats not fun about jquery style is there are many ways to build that same html string using .wrap or any of other dozen jquery html manipulation functions ... so the same html could be structured very differently in the code. Furthermore jquery chain can get pretty long or be made up of lots of other vars, potentially making it tricky to rearrange things or identify what html is coming from where. But perhaps that could be addressed by having jquery html construction conventions (or a wrapper that mirrored our php side html construction conventions? ) In general I have used the html output style but did not really think about it a-priori and I am open to transitioning to more jquery style output. here is the html: you can copy and paste this in... on my system Firefox nightly str builder hovers around 20ms while jquery builder hovers around 150ms (hard to say what would be a good target number of dom actions or what is a fair test...) ...jquery could for example output to a variable instead of direct to dom output shaving 10ms or so and many other tweaks are possible. html head titleJquery vs str buider/title script type=text/javascript src=http://jqueryjs.googlecode.com/files/jquery-1.3.2.min.js;/script script type=text/javascript var repetCount = 200; function runTest( mode ){ $('#cat').html(''); var t0 = new Date().getTime(); if( mode =='str'){ doStrBuild(); }else{ dojBuild(); } $('#rtime').html( (new Date().getTime() - t0) + 'ms'); } function doStrBuild(){ var o = ''; for(var i =0 ;i repetCount;i++){ o+=''+ 'span id=' + escape(i) + ' class=fish' + 'p class=dog rel=foo ' + escape(i) + '/p' + '/span'; } $('#cat').append(o); } function dojBuild(){ for(var i =0 ;i repetCount;i++){ $('span/') .attr({ 'id': i, 'class':'fish' }) .append( $('p/') .attr({ 'class':'dog', 'rel' : 'foo' }) .text( i ) ).appendTo('#cat'); } } /script /head body h3Jquery vs dom insert/h3 Run Time:span id=rtime/span/divbr a onClick=javascript:runTest('str'); href=#Run Str/abr a onClick=javascript:runTest('dom'); href=#Run Jquery/abr br div id=cat/div /body /html --michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] js2 coding style for html output
[snip] what I think we have here, is that $('#cat') is expensive, and run inside a loop in dojBuild you can build and append in the jquery version and it only shaves 10ms. ie the following still incurs the jquery html building function call costs: function dojBuild(){ var o =''; for(var i =0 ;i repetCount;i++){ o+=$('span/') .attr({ 'id': i, 'class':'fish' }) .append( $('p/') .attr({ 'class':'dog', 'rel' : 'foo' }) .text( i ) ).html() } $('#cat').append(o); } ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] JS2 design (was Re: Working towards branching MediaWiki 1.16)
Tim Starling wrote: Michael Dale wrote: That is part of the idea of centrally hosting reusable client-side components so we control the jquery version and plugin set. So a new version won't come along until its been tested and integrated. You can't host every client-side component in the world in a subdirectory of the MediaWiki core. Not everyone has commit access to it. Nobody can hope to properly test every MediaWiki extension. Most extension developers write an extension for a particular site, and distribute their code as-is for the benefit of other users. They have no interest in integration with the core. If they find some jQuery plugin on the web that defines an interface that conflicts with MediaWiki, say jQuery.load() but with different parameters, they're not going to be impressed when you tell them that to make it work with MediaWiki, they need to rewrite the plugin and get it tested and integrated. Different modules should have separate namespaces. This is a key property of large, maintainable systems of code. Right.. I agree the client side code needs more deployable modularly. If designing a given component as a jquery plug-in, then I think it makes sense to put it in the jQuery namespace ... otherwise you won't be able to reference jquery things in a predictable way. Alternativly you I agree that the present system of parsing top of the javascipt file on every script-loader generation request is un-optimized. (the idea is those script-loader generations calls happen rarely but even still it should be cached at any number of levels. (ie checking the filemodifcation timestamp, witting out a php or serialized file .. or storing it in any of the other cache levels we have available, memcahce, database, etc ) Actually it parses the whole of the JavaScript file, not the top, and it does it on every request that invokes WebStart.php, not just on mwScriptLoader.php requests. I'm talking about jsAutoloadLocalClasses.php if that's not clear. Ah right... previously I had it in php. I wanted to avoid listing it twice but obviously thats a pretty costly way to do that. This will make more sense to put in php if we start splitting up components into the extension folders and generate the path list dynamically for a given feature set. Have you looked at the profiling? On the Wikimedia app servers, even the simplest MW request takes 23ms, and gen=js takes 46ms. A static file like wikibits.js takes around 0.5ms. And that's with APC. You say MW on small sites is OK, I think it's slow and resource-intensive. That's not to say I'm sold on the idea of a static file cache, it brings its own problems, which I listed. yea... but almost all script-loader request will be cached. it does not need to check the DB or anything its just a key-file lookup (since script-loader request pass a request key either its there in cache or its not ...it should be on par with the simplest MW request. Which is substantially shorter then around trip time for getting each script individually, not to mention gziping which can't otherwise be easily enabled for 3rd party installations. I don't think that that comparison can be made so lightly. For the server operator, CPU time is much more expensive than time spent waiting for the network. And I'm not proposing that the client fetches each script individually, I'm proposing that scripts be concatentated and stored in a cache file which is then referenced directly in the HTML. I understand. We could even check gziping support at page output time and point to the gziped cached versions (analogous to making direct links to the /script-cache folder of the of the present script-loader setup ) My main question is how will this work for dynamic groups of scripts set post page load that are dictated by user interaction or client state? Its not as easy to setup static combined output files to point to when you don't know what set of scripts you will be requesting ahead of time. $wgSquidMaxage is set to 31 days (2678400 seconds) for all wikis except wikimediafoundation.org. It's necessary to have a very long expiry time in order to fill the caches and achieve a high hit rate, because Wikimedia's access pattern is very broad, with the long tail dominating the request rate. oky... so to preserve high cache level you could then have a single static file that lists versions of js with a low expire and the rest with high expire? Or maybe its so cheep to serve static files that it does not mater and just leave everything with a low expire? --michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] JS2 design (Read this Not Previous)
~ dough ~ Disregard previous, bad key stroke sent rather than save to draft. Tim Starling wrote: Michael Dale wrote: That is part of the idea of centrally hosting reusable client-side components so we control the jquery version and plugin set. So a new version won't come along until its been tested and integrated. You can't host every client-side component in the world in a subdirectory of the MediaWiki core. Not everyone has commit access to it. Nobody can hope to properly test every MediaWiki extension. Most extension developers write an extension for a particular site, and distribute their code as-is for the benefit of other users. They have no interest in integration with the core. If they find some jQuery plugin on the web that defines an interface that conflicts with MediaWiki, say jQuery.load() but with different parameters, they're not going to be impressed when you tell them that to make it work with MediaWiki, they need to rewrite the plugin and get it tested and integrated. Different modules should have separate namespaces. This is a key property of large, maintainable systems of code. Right.. I agree the client side code needs more deployable modularly. It just tricky to manage all those relationships in php, but it appears it will be necessary to do so... If designing a given component as a jQuery plug-in, then I think it makes sense to put it in the jQuery namespace ... otherwise you won't be able to reference jQuery things locally and no-conflict compatible way. Unless we create a mw wrapper of some sorts but I don't know how necessary that is atm... i guess it would be slightly cleaner. I agree that the present system of parsing top of the javascipt file on every script-loader generation request is un-optimized. (the idea is those script-loader generations calls happen rarely but even still it should be cached at any number of levels. (ie checking the filemodifcation timestamp, witting out a php or serialized file .. or storing it in any of the other cache levels we have available, memcahce, database, etc ) Actually it parses the whole of the JavaScript file, not the top, and it does it on every request that invokes WebStart.php, not just on mwScriptLoader.php requests. I'm talking about jsAutoloadLocalClasses.php if that's not clear. Ah right... previously I had it in php. I wanted to avoid listing it twice but obviously thats a pretty costly way to do that. This will make more sense to put in php if we start splitting up components into the extension folders and generate the path list dynamically for a given feature set. Have you looked at the profiling? On the Wikimedia app servers, even the simplest MW request takes 23ms, and gen=js takes 46ms. A static file like wikibits.js takes around 0.5ms. And that's with APC. You say MW on small sites is OK, I think it's slow and resource-intensive. That's not to say I'm sold on the idea of a static file cache, it brings its own problems, which I listed. yea... but almost all script-loader request will be cached. it does not need to check the DB or anything its just a key-file lookup (since script-loader request pass a request key either its there in cache or its not ...it should be on par with the simplest MW request. Which is substantially shorter then around trip time for getting each script individually, not to mention gziping which can't otherwise be easily enabled for 3rd party installations. I don't think that that comparison can be made so lightly. For the server operator, CPU time is much more expensive than time spent waiting for the network. And I'm not proposing that the client fetches each script individually, I'm proposing that scripts be concatentated and stored in a cache file which is then referenced directly in the HTML. I understand. (its analogous to making direct links to the /script-cache folder instead of requesting the files through the script-loader entry point ) My main question is how will this work for dynamic groups of scripts set post page load that are dictated by user interaction or client state? Do we just ignore this possibly and grab any necessary module components based on pre-defined module sets in php that get passed down to javascript? Its not as easy to setup static combined output files to point to when you don't know what set of scripts you will be requesting... hmm... if we had a predictable key format we could do a request for the static file. if we get a 404 then we do a request a dynamic request to generate the static file?.. Subsequent interactions would hit that static file? that seems ugly though. $wgSquidMaxage is set to 31 days (2678400 seconds) for all wikis except wikimediafoundation.org. It's necessary to have a very long expiry time in order to fill the caches and achieve a high hit rate, because Wikimedia's access pattern is very broad, with the long tail
Re: [Wikitech-l] JS2 design (was Re: Working towards branching MediaWiki 1.16)
we have js2AddOnloadHook that gives you jquery in no conflict as $j variable the idea behind using a different name is to separate jquery based code from the older non-jquery based code... but if taking a more iterative approach we could replace the addOnloadHook function. --michael Daniel Friesen wrote: I got another, not from the thread of course. I'd like addOnloadHook to be replaced by jQuery's ready which does a much better job of handling load events. ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name] Tim Starling wrote: Here's what I'm taking out of this thread: * Platonides mentions the case of power-users with tens of scripts loaded via gadgets or user JS with importScript(). * Tisza asks that core onload hooks and other functions be overridable by user JS. * Trevor and Michael both mention i18n as an important consideration which I have not discussed. * Michael wants certain components in the js2 directory to be usable as standalone client-side libraries, which operate without MediaWiki or any other server-side application. -- Tim Starling ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] JS2 design (was Re: Working towards branching MediaWiki 1.16)
thanks for the constructive response :) ... comments inline Tim Starling wrote: I agree we should move things into a global object ie: $j and all our components / features should extend that object. (like jquery plugins). That is the direction we are already going. I think it would be better if jQuery was called window.jQuery and MediaWiki was called window.mw. Then we could share the jQuery instance with JS code that's not aware of MediaWiki, and we wouldn't need to worry about namespace conflicts between third-party jQuery plugins and MediaWiki. Right but there are benefits to connecting into the jQuery plugin system that would not be as clean to wrap into our window.mw object. For example $('#textbox').wikiEditor() is using jQuery selectors for the target, and maybe other jQuery plugin conventions like the jquery class alias inside the function(){})(jQuery); Although if not designing your tool as a jQuery plugin then yea ;) ... but I think most of the tools should be designed as jQuery plug-ins. Dependency loading is not really beyond the scope... we are already supporting that. If you check out the mv_jqueryBindings function in mv_embed.js ... here we have loader calls integrated into the jquery binding. This integrates loading the high level application interfaces into their interface call. Your so-called dependency functions (e.g. doLoadDepMode) just seemed to be a batch load feature, there was no actual dependency handling. Every caller was required to list the dependencies for the classes it was loading. I was referring to defining the dependencies in the module call ... ie $j('target').addMediaWiz( config ) and having the addMediaWiz module map out the dependencies in the javascript. doLoadDepMode just lets you get around an IE bug that when inserting scripts via the dom you have no gurantee one script will execute in the order inserted. If you your conncatinaging your scripts doLoadDepMode would not be needed as order will be preserved in the concatenated file. I like mapping out the dependencies in javascript at that module level since it makes it easier to do custom things like read the passed in configuration and decide which dependencies you need to fulfill. If not you have to define many dependency sets in php or have much more detailed model of your javscript inside php. But I do understand that it will eventually result in lots of extra javascript module definitions that the given installation may not want. So perhaps we generate that module definition via php configuration ... or we define the set of javascript files to include that define the various module loaders we want with a given configuration. This is sort the approach taken with the wikiEditor that has a few thin javascript files that make calls to add modules (like add-sidebar) to a core component (wikiEditor). That way the feature set can be controlled by the php configuration while retaining runtime flexibility for dependence mapping. The idea is to move more and more of the structure of the application into that system. so right now mwLoad is a global function but should be re-factored into the jquery space and be called via $j.load(); | | That would work well until jQuery introduced its own script-loader plugin with the same name and some extension needed to use it. That is part of the idea of centrally hosting reusable client-side components so we control the jquery version and plugin set. So a new version won't come along until its been tested and integrated. If the function does mediawiki specifc scriptloader load stuff then yea it should be called mwLoad or what not. If some other plugin or native jquery piece comes along we can just have our plugin override it and or store the native as a parent (if its of use) ... if that ever happens... We could add that convention directly into the script-loader function if desired so that on a per class level we include dependencies. Like mwLoad('ui.dialog') would know to load ui.core etc. Yes, that is what real dependency handling would do. Thinking about this more ... I think its a bad idea to exclusively put the dependency mapping in php. It will be difficult to avoid re-including the same things in client side loading chains. Say you have your suggest search system once the user starts typing we load jquery.suggest it knows that it needs jquery ui via dependency mapping stored in php. It sends both ui and suggest to the client. Now the user in the same page instance decides instead to edit a section. The editTool script-loader gets called its dependencies also include jquery.ui. How will the dependency-loader script-server know that the client already has the jquery.ui dependency from the suggest tool? In the end you need these dependencies mapped out in the JS so that the client can intelligibly request the script set it needs. In that same example if the
Re: [Wikitech-l] JS2 design (was Re: Working towards branching MediaWiki 1.16)
~some comments inline~ Tim Starling wrote: [snip] I started off working on fixing the coding style and the most glaring errors from the JS2 branch, but I soon decided that I shouldn't be putting so much effort into that when a lot of the code would have to be deleted or rewritten from scratch. I agree there are some core components that should be separated out and re-factored. And some core pieces that your probably focused on do need to be removed rewritten as they are aged quite a bit. (parts of mv_embed.js where created in SOC 06) ... I did not focus on the ~best~ core loader that could have been created I have just built on what I already had available that has worked reasonably well for the application set that I was targeting. Its been an iterative process which I feel is moving in the right direction as I will outline below. Obviously more input is helpful and I am open to implementing most of the changes you describe as they make sense. But exclusion and dismissal may not be less helpful... unless that is your targeted end in which case just say so ;) Its normal for 3rd party observer to say the whole system should be scraped and rewritten. Of course starting from scratch is much easier to design an ideal system and what it should/could be. I did a survey of script loaders in other applications, to get an idea of what features would be desirable. My observations came down to the following: * The namespacing in Google's jsapi is very nice, with everything being a member of a global google object. We would do well to emulate it, but migrating all JS to such a scheme is beyond the scope of the current project. You somewhat contradict this approach by recommending against class abstraction below.. ie how will you cleanly load components and dependencies if not by a given name? I agree we should move things into a global object ie: $j and all our components / features should extend that object. (like jquery plugins). That is the direction we are already going. Dependency loading is not really beyond the scope... we are already supporting that. If you check out the mv_jqueryBindings function in mv_embed.js ... here we have loader calls integrated into the jquery binding. This integrates loading the high level application interfaces into their interface call. The idea is to move more and more of the structure of the application into that system. so right now mwLoad is a global function but should be re-factored into the jquery space and be called via $j.load(); | | * You need to deal with CSS as well as JS. All the script loaders I looked at did that, except ours. We have a lot of CSS objects that need concatenation, and possibly minification. Brion did not set that as high priority when I inquired about it, but of course we should add in style grouping as well. It's not like I said we should exclude that in our script-loader just a matter of setting priority which I agree is high priority. * JS loading can be deferred until near the /body or until the DOMContentLoaded event. This means that empty-cache requests will render faster. Wordpress places emphasis on this. true. I agree that we should put the script includes at the bottom. Also all non-core js2 scripts is already loaded via DOMContentLoaded ready event. Ideally we should only provide loaders and maybe some small bit of configuration for the client side applications they provide. As briefly described here: http://www.mediawiki.org/wiki/JS2_Overview#How_to_structure_your_JavaScript_application * Dependency tracking is useful. The idea is to request a given module, and all dependencies of that module, such as other scripts, will automatically be loaded first. As mentioned above we do some dependency tracking via binding jquery helpers that do that setup internally on a per application interface level. We could add that convention directly into the script-loader function if desired so that on a per class level we include dependencies. Like mwLoad('ui.dialog') would know to load ui.core etc. I then looked more closely at the current state of script loading in MediaWiki. I made the following observations: * Most linked objects (styles and scripts) on a typical page view come from the Skin. If the goal is performance enhancement, then working on the skins and OutputPage has to be a priority. agreed. The script-loading was more urgent for my application task set. But for the common case of per page view performance css grouping has bigger wins. * The class abstraction as implemented in JS2 has very little value to PHP callers. It's just as easy to use filenames. The idea with class abstraction is that you don't know what script set you have available at any given time. Maybe one script included ui.resizable and ui.move and now your script depends on ui.resizable and ui.move and ui.drag... your loader call will only include ui.drag (since the
Re: [Wikitech-l] Working towards branching MediaWiki 1.16
I would add that I am of course open to reorganization and would happily discuss why any given decision was made ... be it trade offs with other ways of doing things or lack of time to do it differently / better. I also add that not all the legacy support and metavid based code has been factored out. (for example for a while I supported the form based upload but now that the upload api is in place I should remove that old code) Other things like timed text support are barely supported because of lack of time. But I would want to keep the skeleton of timed text in there so once we do get around to adding timed text for video we have a basis to move forward from. I suggest for a timely release that you strip the js2 folder and make a note that the configuration variable can not be turned on in this release. And help me identify any issues that need to be addressed for inclusion the next release? And finally, the basic direction and feature set was proposed on this list quite some time ago and ~some~ feedback was given at the time. I would also would echo Trevor's call for more discussion with affected parties if your proposing significant changes. peace, --michael Trevor Parscal wrote: On 9/22/09 6:19 PM, Tim Starling wrote: Siebrand Mazeland wrote: Hola, I just created https://bugzilla.wikimedia.org/show_bug.cgi?id=20768 (Branch 1.16) and Brion was quick to respond that some issues with js2 and the new-upload stuff need to be ironed out; valid concerns, of course. I proposed to make bug 20768 a tracking bug, so that it can be made visible what issues are to/could be considered blocking something we can make a 1.16 out of. Let the dependency tagging begin. Users of MediaWiki trunk are encouraged to report each and every issue, so that what is known can also be resolved (eventually). I'm calling on all volunteer coders to keep an eye on this issue and to help out fixing issues that are mentioned here. I've been working on a rewrite of the script loader and a reorganisation of the JS2 stuff. I'd like to delay 1.16 until that's in and tested. Brion has said that he doesn't want Michael Dale's branch merge reverted, so as far as I can see, a schedule delay is the only other way to maintain an appropriate quality. -- Tim Starling ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l If you are really doing a JS2 rewrite/reorganization, would it be possible for some of us (especially those of us who deal almost exclusively with JavaScript these days) to get a chance to ask questions/give feedback/help in general? While I think a rewrite/reorganization could be really awesome if done right, and also that getting it right will be easier if we can get some interested parties informed/consulted. I know that Michael Dale's work was more or less done outside of the general MediaWiki branch for the majority of it's development, and afaik it has been a work in progress for some time, so I feel that such a golden opportunity has never really come up before. Aside from my own desire to be involved at some level, it seems fitting to have some sort of discussion at times like these so we can make sure we are making the best decisions about software before it's deployed - as making changes to deployed software is seems to often be much more difficult. Perhaps there's a MediaWiki page, or a time on IRC, or even just continuing on this list...? My first question is: What are you changing and how, and what are you moving and where? - Trevor ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Video transcoding settings Was: [54611] trunk/extensions/WikiAtHome/WikiAtHome.php
yea was using the wrong version of ffmpeg2theora locally ;)... Thanks for the reminder, updated our ffmpeg2theora encode command in r55042 ... an update to firefogg should support the --buf-delay argument shortly as well. --michael Gregory Maxwell wrote: On Fri, Aug 7, 2009 at 5:29 PM, d...@svn.wikimedia.org wrote: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/54611 Revision: 54611 Author: dale Date: 2009-08-07 21:29:26 + (Fri, 07 Aug 2009) Log Message: --- added a explicit keyframeInterval per gmaxwell's mention on wikitech-l. (I get ffmpeg2theora: unrecognized option `--buf-delay for adding in buf-delay) I thought firefogg was tracking j^'s nightly? If the encoder has two-pass it has --buf-delay. Does firefog perhaps need to be changed to expose it? ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Video Quality for Derivatives (was Re:w...@home Extension)
So I committed ~basic~ derivate code support for oggHandler in r54550 (more solid support on the way) Based input from the w...@home thread; here are updated target qualities expressed via the firefogg api to ffmpeg2thoera Also j^ was kind enough to run these settings on some sample input files: http://firefogg.org/j/encoding_samples/ so you can check them out there. We want to target 400 wide for the web stream to be consistent with archive.orgs which encodes mostly to 400x300 (although their 16:9 stuff can be up to 530 wide) ... Updated mediaWiki firefogg integration and the stand alone encoder app these default transcode settings. in r54552 r54554 (should be pushed out to http://firefogg.org/make shortly ... or can be run @home with a trunk check out at: /js2/mwEmbed/example_usage/Firefogg_Make_Advanced.html anyway on to the settings: $wgDerivativeSettings[ WikiAtHome::ENC_SAVE_BANDWITH ] = array( 'maxSize'= '200', 'videoBitrate'= '164', 'audioBitrate'= '32', 'samplerate'= '22050', 'framerate'= '15', 'channels'= '1', 'noUpscaling'= 'true' ); $wgDerivativeSettings[ WikiAtHome::ENC_WEB_STREAM ] = array( 'maxSize'= '400', 'videoBitrate'= '544', 'audioBitrate'= '96', 'noUpscaling'= 'true' ); $wgDerivativeSettings[ WikiAtHome::ENC_HQ_STREAM ] = array( 'maxSize' = '1080', 'videoQuality'= 6, 'audioQuality'= 3, 'noUpscaling'= 'true' ); --michael Brion Vibber wrote: On 8/3/09 9:56 PM, Gregory Maxwell wrote: [snip] Based on 'what other people do' I'd say the low should be in the 200kbit-300kbit/sec range. Perhaps taking the high up to a megabit? There are also a lot of very short videos on Wikipedia where the whole thing could reasonably be buffered prior to playback. Something I don't have an answer for is what resolutions to use. The low should fit on mobile device screens. At the moment the defaults we're using for Firefogg uploads are 400px width (eg, 400x300 or 400x225 for the most common aspect rations) targeting a 400kbps bitrate. IMO at 400kbps at this size things don't look particularly good; I'd prefer a smaller size/bitrate for 'low' and higher size/bitrate for medium qual. From sources I'm googling up, looks like YouTube is using 320x240 for low-res, 480x360 h.264 @ 512kbps+128kbps audio for higher-qual, with 720p h.264 @ 1024Kbps+232kbps audio available for some HD videos. http://www.squidoo.com/youtubehd These seem like pretty reasonable numbers to target; offhand I'm not sure the bitrates used for the low-res version but I think that's with older Flash codecs anyway so not as directly comparable. Also, might we want different standard sizes for 4:3 vs 16:9 material? Perhaps we should wrangle up some source material and run some test compressions to get a better idea what this'll look like in practice... Normally I'd suggest setting the size based on the content: Low motion detail oriented video should get higher resolutions than high motion scenes without important details. Doubling the number of derivatives in order to have a large and small setting on a per article basis is probably not acceptable. :( Yeah, that's way tougher to deal with... Potentially we could allow some per-file tweaks of bitrates or something, but that might be a world of pain. :) As an aside— downsampled video needs some makeup sharpening like downsampled stills will. I'll work on getting something in ffmpeg2theora to do this. Woohoo! There is also the option of decimating the frame-rate. Going from 30fps to 15fps can make a decent improvement for bitrate vs visual quality but it can make some kinds of video look jerky. (Dropping the frame rate would also be helpful for any CPU starved devices) 15fps looks like crap IMO, but yeah for low-bitrate it can help a lot. We may wish to consider that source material may have varying frame rates, most likely to be: 15fps - crappy low-res stuff found on internet :) 24fps / 23.98 fps - film-sourced 25fps - PAL non-interlaced 30fps / 29.97 fps - NTSC non-interlaced or many computer-generated vids 50fps - PAL interlaced or PAL-compat HD native 60fps / 59.93fps - NTSC interlaced or HD native And of course those 50 and 60fps items might be encoded with or without interlacing. :) Do we want to normalize everything to a standard rate, or maybe just cut 50/60 to 25/30? (This also loses motion data, but not as badly as decimation to 15fps!) This brings me to an interesting point about instant gratification: Ogg was intended from day one to be a streaming format. This has pluses and minuses, but one thing we should take
Re: [Wikitech-l] Wiki at Home Extension
Google's cost is probably more on the distribution side of things ... but I only found a top level number not a break down of component costs. At any rate the point is to start exploring distributing costs associated with large scale video collaboration. In that way I target developing a framework where individual pieces can be done on the server or on clients depending on what is optimal. Its not that much extra effort to design things this way. Look back 2 years and you can see the xiph communities blog posts and conversations with Mozilla. It was not a given that Firefox would ship with ogg theora baseline video support (they took some convening and had to do some thinking about it, a big site like wikipedia exclusively using the free formats technology probably helped their decision). Originally the xiph/annodex community built the liboggplay library as an extension. This later became the basis for the library that powers firefox ogg theora video today. Likewise we are putting features into firefogg that we eventually hope will be supported by browsers natively. Also in theory we could put a thin bittorrent client into java Cortado to support IE users as well. peace, --michael Tisza Gergő wrote: Michael Dale mdale at wikimedia.org writes: * We are not Google. Google lost what like ~470 million~ last year on youtube ...(and that's with $240 million in advertising) so total cost of $711 million [1] How much of that is related to transcoding, and how much to delivery? You seem to be conflating the two issues. We cannot do much to cut delivery costs, save for serving less movies to readers - distributed transcoding would actually raise them. (Peer-to-peer video distribution sounds like a cool feature, but it needs to be implemented by browser vendors, not Wikimedia.) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Wiki at Home Extension
yea would have to be opt in. Would have to have controls over how-much bandwidth sent out... We could encourage people to enable it by sending out a the higher bit-rate / quality version ~by default~ for those that opt-in. --michael Ryan Lane wrote: On Mon, Aug 3, 2009 at 1:57 PM, Michael Dalemd...@wikimedia.org wrote: Look back 2 years and you can see the xiph communities blog posts and conversations with Mozilla. It was not a given that Firefox would ship with ogg theora baseline video support (they took some convening and had to do some thinking about it, a big site like wikipedia exclusively using the free formats technology probably helped their decision). Originally the xiph/annodex community built the liboggplay library as an extension. This later became the basis for the library that powers firefox ogg theora video today. Likewise we are putting features into firefogg that we eventually hope will be supported by browsers natively. Also in theory we could put a thin bittorrent client into java Cortado to support IE users as well. If watching video on Wikipedia requires bittorrent, most corporate environments are going to be locked out. If a bittorrent client is loaded by default for the videos, most corporate environments are going to blacklist wikipedia's java apps. I'm not saying p2p distributed video is a bad idea, and the Wikimedia foundation may not care about how corporate environments react; however, I think it is a bad idea to either force users to use a p2p client, or make them opt-out. Ignoring corporate clients... firing up a p2p client on end-user's systems could cause serious issues for some. What if I'm browsing on a 3g network, or a satellite connection where my bandwidth is metered? Maybe this is something that could be delivered via a gadget and enabled in user preferences? V/r, Ryan Lane ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] w...@home Extension
perhaps if people create a lot of voice overs ~Kens burns~ effects on commons images with the occasional inter-spliced video clip with lots of back and forth editing... and we are constantly creating timely derivatives of these flattened sequences that ~may~ necessitate such a system.. because things will be updating all the time ... ... but anyway... yea for now will focus on flattening sequences... Did a basic internal encoder committed in r54340... Could add some enhancements but lets spec out want we want ;) Still need to clean up the File:myFile.mp4 situation. Probably store in a temp location write out a File:myFile.ogg placeholder then once transcoded swap it in? Also will hack in adding derivatives to the job queue where oggHandler is embed in a wiki-article at a substantial lower resolution than the source version. Will have it send the high res version until the derivative is created then purge the pages to point to the new location. Will try and have the download link still point to the high res version. (we will only create one or two derivatives... also we should decide if we want an ultra low bitrate (200kbs or so version for people accessing Wikimedia on slow / developing country connections) peace, michael Brion Vibber wrote: On 7/31/09 6:51 PM, Michael Dale wrote: Want to point out the working prototype of the w...@home extension. Presently it focuses on a system for transcoding uploaded media to free formats, but will also be used for flattening sequences and maybe other things in the future ;) Client-side rendering does make sense to me when integrated into the upload and sequencer processes; you've got all the source data you need and local CPU time to kill while you're shuffling the bits around on the wire. But I haven't yet seen any evidence that a distributed rendering network will ever be required for us, or that it would be worth the hassle of developing and maintaining it. We're not YouTube, and don't intend to be; we don't accept everybody's random vacation videos, funny cat tricks, or rips from Cartoon Network... Between our licensing requirements and our limited scope -- educational and reference materials -- I think we can reasonably expect that our volume of video will always be *extremely* small compared to general video-sharing sites. We don't actually *want* everyone's blurry cell-phone vacation videos of famous buildings (though we might want blurry cell-phone videos of *historical events*, as with the occasional bit of interesting news footage). Shooting professional-quality video suitable for Wikimedia use is probably two orders of magnitude harder than shooting attractive, useful still photos. Even if we make major pushes on the video front, I don't think we'll ever have the kind of mass volume that would require a distributed encoding network. -- brion ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Wiki at Home Extension
two quick points. 1) you don't have to re-upload the whole video just the sha1 or some sort of hash of the assigned chunk. 2) should be relatively strait froward to catch abuse via assigned user id's to each chunk uploaded. But checking the sha1 a few times from other random clients that are encoding other pieces would make abuse very difficult... at the cost of a few small http requests after the encode is done, and at a cost of slightly more CPU cylces of the computing pool. But as this thread has pointed out CPU cycles are much cheaper than bandwidth bits or humans time patrolling derivatives. We have the advantage with a system like Firefogg that we control the version of the encoder pushed out to clients via auto-update and check that before accepting their participation (so sha1s should match if the client is not doing anything fishy) But these are version 2 type features conditioned on 1) Bandwidth being cheep and internal computer system maintenance and acquisition being slightly more costly. (and or 2) We probably want to integrating a thin bittorrent client into firefogg so we hit the sending out the source footage only once upstream cost ratio. We need to start exploring the bittorrent integration anyway to distribute the bandwidth cost on the distribution side. So this work would lead us in a good direction as well. peace, --michael Tisza Gergő wrote: Steve Bennett stevagewp at gmail.com writes: Why are we suddenly concerned about someone sneaking obscenity onto a wiki? As if no one has ever snuck a rude picture onto a main page... There is a slight difference between vandalism that shows up in recent changes and one that leaves no trail at all except maybe in log files only accessible for sysadmins. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Wiki at Home Extension
Lets see... * all these tools will be needed for flattening sequences anyway. In that case CPU costs are really really high like 1/5 or lower real-time and the number of computation needed explodes much faster as every stable edit necessitates a new flattening of some portion of the sequence. * I don't think its possible to scale the foundation's current donation model to traditional free net video distribution. * We are not Google. Google lost what like ~470 million~ last year on youtube ...(and that's with $240 million in advertising) so total cost of $711 million [1] say we manage to do 1/100th of youtube ( not unreasonable consider we are a top 4 site. Just imagine a world where you watch one wikipedia video for every 100 you watch on youtube ) ... then we would be what like what 7x the total budget ? ( and they are not supporting video editing with flattening of sequences ) ... The pirate bay on the other hand operates at a technology cost comparable to wikimedia (~$3K~ a month in bandwidth) and is distributing like 1/2 of the nets torrents? [2]. (obviously these numbers are a bit of tea leaf reading but give or take an order of magnitude it should still be clear which model we should be moving towards.) ... I think its good to start thinking about p2p distribution and computation ... even if we are not using it today ... * I must say I don't quite agree with your proposed tactic to retain neutral networks by avoiding bandwidth distribution via peer 2 peer technology. I am aware the net is not built for p2p nor is it very efficient vs CDNs ... but the whole micro payment system never paned out ... Perhaps your right p2p will just give companies an excuse to restructure the net in a non network neutral way... but I think they already have plenty excuse with the existing popular bittorrent systems and don't see another way other way for not-for-profit net communities to distribute massive amounts of video to each-other. * I think you may be blowing this ~a bit~ outside of proportion calling into question foundation priority in the context of this hack. If this was a big initiative over the course of a year or a initiative over the course of more than part-time over a week ~ ... then it would make more sense to worry about this. But in its present state its just a quick hack and the starting point of conversation not foundation policy or initiative. peace, michael [1] http://www.ibtimes.com/articles/20090413/alleged-470-million-youtube-loss-will-be-cleared-week.htm [2] http://newteevee.com/2009/07/19/the-pirate-bay-distributing-the-worlds-entertainment-for-3000-a-month/ Gregory Maxwell wrote: On Sun, Aug 2, 2009 at 6:29 PM, Michael Dalemd...@wikimedia.org wrote: [snip] two quick points. 1) you don't have to re-upload the whole video just the sha1 or some sort of hash of the assigned chunk. But each re-encoder must download the source material. I agree that uploads aren't much of an issue. [snip] other random clients that are encoding other pieces would make abuse very difficult... at the cost of a few small http requests after the encode is done, and at a cost of slightly more CPU cylces of the computing pool. Is 2x slightly? (Greater because some clients will abort/fail.) Even that leaves open the risk that a single trouble maker will register a few accounts and confirm their own blocks. You can fight that too— but it's an arms race with no end. I have no doubt that the problem can be made tolerably rare— but at what cost? I don't think it's all that acceptable to significantly increase the resources used for the operation of the site just for the sake of pushing the capital and energy costs onto third parties, especially when it appears that the cost to Wikimedia will not decrease (but instead be shifted from equipment cost to bandwidth and developer time). [snip] We need to start exploring the bittorrent integration anyway to distribute the bandwidth cost on the distribution side. So this work would lead us in a good direction as well. http://lists.wikimedia.org/pipermail/wikitech-l/2009-April/042656.html I'm troubled that Wikimedia is suddenly so interested in all these cost externalizations which will dramatically increase the total cost but push those costs off onto (sometimes unwilling) third parties. Tech spending by the Wikimedia Foundation is a fairly small portion of the budget, enough that it has drawn some criticism. Behaving in the most efficient manner is laudable and the WMF has done excellently on this front in the past. Behaving in an inefficient manner in order to externalize costs is, in my view, deplorable and something which should be avoided. Has some organizational problem arisen within Wikimedia which has made it unreasonably difficult to obtain computing resources, but easy to burn bandwidth and development time? I'm struggling to understand
Re: [Wikitech-l] w...@home Extension
Some notes: * ~its mostly an api~. We can run it internally if that is more cost efficient. ( will do on a command line client shortly ) ... (as mentioned earlier the present code was hacked together quickly its just a prototype. I will generalize things to work better as internal jobs. and I think I will not create File:Myvideo.mp4 wiki pages rather create a placeholder File:Myvideo.ogg page and only store the derivatives outside of wiki page node system. I also notice some sync issues with oggCat which are under investigation ) * Clearly CPU's, are cheep so is power for the commuters, human resources for system maintenance, rack-space and internal network management, and we of-course will want to run the numbers on any solution we go with. I think your source bitrate assumption was a little high I would think more like 1-2Mbs (with cell-phone camaras targeting low bitrates for transport and desktops re-encoding before upload). But I think this whole convesation is missing the larget issue which is if its cost prohibitive to distribute a few copies for transcode how are we going to distribute the derivatives thousands of times for viewing? Perhaps future work in this area should focus more on the distributing bandwith cost issue. * Furthermore I think I might have mis-represented w...@home I should have more clearly focused on the sequence flattening and only mentioned transocding as an option. With sequence flattening we have a more standard viewing bitrate of source material and cpu costs for rendering are much higher. At present there is no fast way to overlay html/svg on video with filters and effects that are only presently predictably defined in javascript. For this reason we use the browser to wysiwyg render out the content. Eventually we may want to write a optimized stand alone flattener, but for now the w...@home solution worlds less costly in terms of developer resources since we can use the editor to output the flat file. 3) And finally yes ... you can already insert a penis into video uploads today. With something like: oggCat | ffmpeg2theora -i someVideo.ogg -s 0 -e 42.2 myOneFramePenis.ogg ffmpeg2theora -i someVideo.ogg -s 42.2 But yea its one more level to worry about and if its cheaper to do it internally (the transcodes not the penis insertion) we should do it internally. :P (I hope other appreciate the multiple levels of humor here) peace, michael Gregory Maxwell wrote: On Sat, Aug 1, 2009 at 2:54 AM, Brianbrian.min...@colorado.edu wrote: On Sat, Aug 1, 2009 at 12:47 AM, Gregory Maxwell gmaxw...@gmail.com wrote: On Sat, Aug 1, 2009 at 12:13 AM, Michael Dalemd...@wikimedia.org wrote: Once you factor in the ratio of video to non-video content for the for-seeable future this comes off looking like a time wasting boondoggle. I think you vastly underestimate the amount of video that will be uploaded. Michael is right in thinking big and thinking distributed. CPU cycles are not *that* cheap. Really rough back of the napkin numbers: My desktop has a X3360 CPU. You can build systems all day using this processor for $600 (I think I spent $500 on it 6 months ago). There are processors with better price/performance available now, but I can benchmark on this. Commons is getting roughly 172076 uploads per month now across all media types. Scans of single pages, photographs copied from flickr, audio pronouncations, videos, etc. If everyone switched to uploading 15 minute long SD videos instead of other things there would be 154,868,400 seconds of video uploaded to commons per-month. Truly a staggering amount. Assuming a 40 hour work week it would take over 250 people working full time just to *view* all of it. That number is an average rate of 58.9 seconds of video uploaded per second every second of the month. Using all four cores my desktop video encodes at 16x real-time (for moderate motion standard def input using the latest theora 1.1 svn). So you'd need less than four of those systems to keep up with the entire commons upload rate switched to 15 minute videos. Okay, it would be slow at peak hours and you might wish to produce a couple of versions at different resolutions, so multiply that by a couple. This is what I meant by processing being cheap. If the uploads were all compressed at a bitrate of 4mbit/sec and that users were kind enough to spread their uploads out through the day and that the distributed system were perfectly efficient (only need to send one copy of the upload out), and if Wikimedia were only paying $10/mbit/sec/month for transit out of their primary dataceter... we'd find that the bandwidth costs of sending that source material out again would be $2356/month. (58.9 seconds per second * 4mbit/sec * $10/mbit/sec/month) (Since transit billing is on the 95th percentile 5 minute average of the greater of inbound or outbound uploads are basically free, but
Re: [Wikitech-l] w...@home Extension
I had to program it anyway to support the distributing of the flattening of sequences. Which has been the planed approach for quite some time. I thought of the name and adding one-off support for transocoding recently, and hacked it up over the past few days. This code will eventually support flattening of sequences. But adding code to do transcoding was a low hanging fruit feature and easy first step. We can now consider if its efficient to use the transcoding feature in wikimedia setup or not but I will use the code either way to support sequence flattening (which has to take place in the browser since there is no other easy way to guarantee wysiwyg flat representation of browser edited sequences ) peace, --michael Mike.lifeguard wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 BTW, Who's idea was this extension? I know Michael Dale is writing it, but was this something assigned to him by someone else? Was it discussed beforehand? Or is this just Michael's project through and through? Thanks, - -Mike -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkp0zv4ACgkQst0AR/DaKHtFVACgyH8J835v8xDGMHL78D+pYrB7 NB8AoMZVwO7gzg9+IYIlZh2Zb3zGG07q =tpEc -END PGP SIGNATURE- ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] w...@home Extension
Want to point out the working prototype of the w...@home extension. Presently it focuses on a system for transcoding uploaded media to free formats, but will also be used for flattening sequences and maybe other things in the future ;) Its still rough around the edges ... it presently features: * Support for uploading a non-free media assets, * putting those non free media assets into a jobs table and distributing the transcode job into $wgChunkDuration length encoding jobs. ( each pieces is uploaded then reassembled on the server. that way big transcoding jobs can be distributed to as many clients that are participating ) * It supports multiple derivatives for different resolutions based on the requested size. ** In the future I will add a hook for oggHanlder to use that as well .. since a big usability issue right now is users embedding HD or high res ogg videos into a small video space in an article ... and it naturally it performs slowly. * It also features a JavaScript interface for clients to query for new jobs, get the job, download the asset, do transcode upload it (all through an api module so people could build a client as a shell script if they wanted) ** In the future the interface will support preferences , basic statistics and more options like turn on w...@home every-time I visit wikipedia or only get jobs while I am away from my computer. * I try and handle derivatives consistently with the file/ media handling system. So right now your uploaded non-free format file will be linked to on the file detail page and via the api calls. We should probably limit client exposure to non-free formats. Obviously they have the files be on a public url to be transcoded, but the interfaces for embedding and the stream detail page should link to the free format version at all times. * I tie transcoded chunks to user ids this makes it easier to disable bad participants. ** I need to add an interface to delete derivatives if someone flags it as so. * it supports $wgJobTimeOut for re-assigning jobs that don't get done in $wgJobTimeOut time. This was hacked together over the past few days so its by no means production ready ... but should get there soon ;) Feedback is welcome. Its in the svn at: /trunk/extensions/WikiAtHome/ peace, michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] w...@home Extension
Gregory Maxwell wrote: On Fri, Jul 31, 2009 at 9:51 PM, Michael Dalemd...@wikimedia.org wrote: the transcode job into $wgChunkDuration length encoding jobs. ( each pieces is uploaded then reassembled on the server. that way big transcoding jobs can be distributed to as many clients that are participating ) This pretty much breaks the 'instant' gratification you currently get on upload. true... people will never upload to site without instant gratification ( cough youtube cough ) ... At any rate its not replacing the firefogg that has instant gratification at point of upload its ~just another option~... Also I should add that this w...@home system just gives us distributed transcoding as a bonus side effect ... its real purpose will be to distribute the flattening of edited sequences. So that 1) IE users can view them 2) We can use effects that for the time being are too computationally expensive to render out in real-time in javascript 3) you can download and play the sequences with normal video players and 4) we can transclude sequences and use templates with changes propagating to flattened versions rendered on the w...@home distributed computer While presently many machines in the wikimedia internal server cluster grind away at parsing and rendering html from wiki-text the situation is many orders of magnitude more costly with using transclution and temples with video ... so its good to get this type of extension out in the wild and warmed up for the near future ;) The segmenting is going to significant harm compression efficiency for any inter-frame coded output format unless you perform a two pass encode with the first past on the server to do keyframe location detection. Because the stream will restart at cut points. also true. Good thing theora-svn now supports two pass encoding :) ... but an extra key frame every 30 seconds properly wont hurt your compression efficiency too much.. vs the gain of having your hour long interview trans-code a hundred times faster than non-distributed conversion. (almost instant gratification) Once the cost of generating a derivative is on par with the cost of sending out the clip a few times for viewing lots of things become possible. * I tie transcoded chunks to user ids this makes it easier to disable bad participants. Tyler Durden will be sad. But this means that only logged in users will participate, no? true... You also have to log in to upload to commons It will make life easier and make abuse of the system more difficult.. plus it can act as a motivation factor with distribu...@home teams, personal stats and all that jazz. Just as people like to have their name show up on the donate wall when making small financial contributions. peace, --michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Recommending a browser for video (was: Proposal: switch to HTML 5)
This is really a foundation / wikimedia community question. ... I will do a short email to foundation-l summarizing the technical discussion. Not that foundation-l has historically been the best way to build consensus but maybe someone else can summarize that discussion and give us a ball-park of the community voice on the matter allowing the foundation to move forward with something. Meanwhile I will try and make sure the new player is good and ready to be integrated ;) --michael Aryeh Gregor wrote: On Wed, Jul 8, 2009 at 6:12 PM, David Gerarddger...@gmail.com wrote: They are happy to foul up the entire standard. I feel there is little to no benefit to us in trying to imply that the situation is otherwise. First of all, Apple is not fouling up the entire standard. They employ one of its two co-editors, their developers contribute to it very actively, and they ship an implementation that's as advanced as anybody's. This is *one* specific feature that they've said they won't implement at the present time (but they may reconsider at any time). Mozilla has vetoed features as well, as Ian Hickson has pointed out. Mozilla refused to implement SQL, so that was removed from the standard, just as mention of Theora was. Second of all, I don't have a serious problem with Wikimedia only advocating the use of open-source software, say. But if it does, it *must* be phrased in a way that makes it clear that it's an advertisement of a product we want the user to use, not a neutral assessment of what the best technology is for viewing the page. Anything else is deliberately misleading, and that's unacceptable. On Wed, Jul 8, 2009 at 6:21 PM, Gregory Maxwellgmaxw...@gmail.com wrote: Regardless, I think we've finished the technical part of this decision— the details are a matter of organization concern now, not technology. Yep, definitely. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Fwd: [whatwg] Serving up Theora video in the real world
Tell the users to complain to Apple? .. Bring up anti-competitive lawsuits against apple? Buy a Mobil device that is less locked down? There is no easy solution when the platform is a walled garden. There are two paths towards supporting html5 video in mobile platforms. 1) getting things working within the provided web browser platform or 2) running your own browser software as an application (we only should consider a normal phone obviously on a jail-broken device you can do lots of things... but that greatly reduces the possibility of wide deployment) I was looking at this situation for the iPhone and Android based phones. I think android based phones have a better shot at supporting ogg theora html5 video in the near term. In the long term the market will drive the devices to support ogg or not. iPhone 1) The internals of the quicktime/media system for the iPhone are not very exposed nor do they appear to be very extendable. 2) The Apple SDK agreement forbids virtual machines of any kind. This effectively makes competing web browsers illegal. Android / HTC phones: 1) I would hope google/android would ship theora/html5 support since theora will be supported in their desktop webkit based chrome browser. I think it would be relatively easy for a given android based phone distributer to support ogg once webkit on android supports html5 video. 2) Android recently added native code exposure: http://android-developers.blogspot.com/2009/06/introducing-android-15-ndk-release-1.html I wonder if this could be a path for a port of Firefox or a custom version of the open source webkit browser on android? --michael David Gerard wrote: Another answer - it'd be custom app time. So the question is: what do we tell iPhone users? - d. -- Forwarded message -- From: Maciej Stachowiak m...@apple.com Date: 2009/7/10 Subject: Re: [whatwg] Serving up Theora video in the real world To: David Gerard dger...@gmail.com Cc: WHATWG Proposals wha...@lists.whatwg.org On Jul 9, 2009, at 2:59 PM, David Gerard wrote: The question is what to do for platforms such as the iPhone, which doesn't even run Java. Is there any way to install an additional codec in the iPhone browser? Is it (even theoretically) possible to put a free app on the AppStore just to play Ogg Theora video for our users? (There are many AppStore apps that support Ogg Vorbis, don't know if any support Theora - so presumably AppStore stuff doesn't give Apple the feared submarine patent exposure.) Just by way of factual information: There's no Java in the iPhone version of Safari. There are no browser plugins. There is no facility for systemwide codec plugins. There is no way to get an App Store app to launch automatically from Web content. I don't think there is any obstacle to posting an App Store app that does nothing but play videos from WikiPedia, the way the YouTube app plays YouTube videos. But I don't think there is a way to integrate it with browsing. Regards, Maciej ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Proposal: switch to HTML 5
We need to inform people that the quality of experience can be substantially improved if they use a browser that supports free formats. Wikimedia only distributes content in free formats because if you have to pay for a licensee to view, edit or publish ~free content~ then the content is not really ~free~. We have requested that Apple and IE support free formats but they have chosen not to. Therefore we are in a position where we have to recommend a browser that does have a high quality user experience in supporting the formats. We are still making every effort to display the formats in IE Safari using java or plugins but we should inform people they can have an improved experience on par with proprietary solutions if they are using different browser. --michael Steve Bennett wrote: On Wed, Jul 8, 2009 at 4:43 PM, Marco Schusterma...@harddisk.is-a-geek.org wrote: We should not recommend Chrome - as good as it is, but it has serious privacy problems. Out of curiosity, why do we need to recommend a browser at all, and why do we think anyone will listen to our recommendation? People use the browser they use. If the site they want to go to doesn't work in their browser, they'll either not go there, or possibly try another one. They're certainly not going to change browsers just because the site told them to. Personally, I use Chrome, FF and IE. And the main reason for switching is just to have different sets of cookies. Occasionally a site doesn't like Chrome, so I switch. But it's not like I'm going to take a your experience would be better in browser statement seriously. Steve ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Proposal: switch to HTML 5
I think if the playback system is java in ~any browser~ we should ~softly~ inform people to get a browser with native support if they want a high quality video playback experience. The cortado applet is awesome ... but startup time of the java vm is painful compared to other user experiences with video.. not to mention seeking, buffering, and general interface responsiveness in comparison to the native support. --michael Gregory Maxwell wrote: On Tue, Jul 7, 2009 at 4:23 PM, Brion Vibberbr...@wikimedia.org wrote: Unless they don't have Ogg support. :) *cough Safari cough* But if they do, yes; our JS won't bother bringing up the Java applet if it's got native support available. It would be a four or five line patch to make OggHandler nag Safari 3/4 users to install XiphQT and give them the link to a download page. The spot for the nag is already stubbed out in the code. Just say the word. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Proposal: switch to HTML 5
Also should be noted a simple patch for oggHandler to output video and use the mv_embed library is in the works see: https://bugzilla.wikimedia.org/show_bug.cgi?id=18869 you can see it in action a few places like http://metavid.org/wiki/File:FolgersCoffe_512kb.1496.ogv Also note my ~soft~ push for native support if you don't already native support. (per our short discussion earlier in this thread) if you say don't show again it sets a cookie and won't show it again. I would be happy to randomly link to other browsers that support html5 video tag with ogg as they ship with that functionality. I don't really have apple machine handy to test quality of user experience in OSX safari with xiph-qt. But if that is on-par with Firefox native support we should probably link to the component install instructions for safari users. --michael Gregory Maxwell wrote: On Tue, Jul 7, 2009 at 1:54 AM, Aryeh Gregorsimetrical+wikil...@gmail.com wrote: [snip] * We could support video/audio on conformant user agents without the use of JavaScript. There's no reason we should need JS for Firefox 3.5, Chrome 3, etc. Of course, that could be done without switching the rest of the site to HTML5... Although I'm not sure that giving the actual video tags is desirable. It's a tradeoff: Work for those users when JS is enabled and correctly handle saving the full page including the videos vs take more traffic from clients doing range requests to generate the poster image, and potentially traffic from clients which decide to go ahead and fetch the whole video regardless of the user asking for it. There is also still a bug in FF3.5 that where the built-in video controls do not work when JS is fully disabled. (Because the controls are written in JS themselves) (To be clear to other people reading this the mediawiki ogghandler extension already uses HTML5 and works fine with Firefox 3.5, etc. But this only works if you have javascript enabled. The site could instead embed the video elements directly, and only use JS to substitute the video tag for fallbacks when it detects that the video tag can't be used) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Minify
I would quickly add that the script-loader / new-upload branch also supports minify along with associating unique id's grouping gziping. So all your mediaWiki page includes are tied to their version numbers and can be cached forever without 304 requests by the client or _shift_ reload to get new js. Plus it works with all the static file based js includes as well. If a given set of files is constantly requested we can group them to avoid server round trips. And finally it lets us localize msg and package that in the JS (again avoiding separate trips for javascript interface msgs) for more info see the ~slightly outdated~ document: http://www.mediawiki.org/wiki/Extension:ScriptLoader peace, michael Robert Rohde wrote: I'm going to mention this here, because it might be of interest on the Wikimedia cluster (or it might not). Last night I deposited Extension:Minify which is essentially a lightweight wrapper for the YUI CSS compressor and JSMin JavaScript compressor. If installed it automatically captures all content exported through action=raw and precompresses it by removing comments, formatting, and other human readable elements. All of the helpful elements still remain on the Mediawiki: pages, but they just don't get sent to users. Currently each page served to anons references 6 CSS/JS pages dynamically prepared by Mediawiki, of which 4 would be needed in the most common situation of viewing content online (i.e. assuming media=print and media=handheld are not downloaded in the typical case). These 4 pages, Mediawiki:Common.css, Mediawiki:Monobook.css, gen=css, and gen=js comprise about 60 kB on the English Wikipedia. (I'm using enwiki as a benchmark, but Commons and dewiki also have similar numbers to those discussed below.) After gzip compression, which I assume is available on most HTTP transactions these days, they total 17039 bytes. The comparable numbers if Minify is applied are 35 kB raw and 9980 after gzip, for a savings of 7 kB or about 40% of the total file size. Now in practical terms 7 kB could shave ~1.5s off a 36 kbps dialup connection. Or given Erik Zachte's observation that action=raw is called 500 million times per day, and assuming up to 7 kB / 4 savings per call, could shave up to 900 GB off of Wikimedia's daily traffic. (In practice, it would probably be somewhat less. 900 GB seems to be slightly under 2% of Wikimedia's total daily traffic if I am reading the charts correctly.) Anyway, that's the use case (such as it is): slightly faster initial downloads and a small but probably measurable impact on total bandwidth. The trade-off of course being that users receive CSS and JS pages from action=raw that are largely unreadable. The extension exists if Wikimedia is interested, though to be honest I primarily created it for use with my own more tightly bandwidth constrained sites. -Robert Rohde ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Minify
correct me if I am wrong but thats how we presently update js and css.. we have $wgStyleVersion and when that gets updated we send out fresh pages with html pointing to js with $wgStyleVersion append. The difference in the context of the script-loader is we would read the version from the mediaWiki js pages that are being included and the $wgStyleVersion var. (avoiding the need to shift reload) ... in the context of rendering a normal page with dozens of template lookups I don't see this a particularly costly. Its a few extra getLatestRevID title calls. Likewise we should do this for images so we can send the cache forever header (bug 17577) avoiding a bunch of 304 requests. One part I am not completely clear on is how we avoid lots of simultaneous requests to the scriptLoader when it first generates the JavaScript to be cached on the squids, but other stuff must be throttled too no? Like when we update any code, language msgs, or local-settings does that does not result in the immediate purging all of wikipedia. --michael Gregory Maxwell wrote: On Fri, Jun 26, 2009 at 4:33 PM, Michael Dalemd...@wikimedia.org wrote: I would quickly add that the script-loader / new-upload branch also supports minify along with associating unique id's grouping gziping. So all your mediaWiki page includes are tied to their version numbers and can be cached forever without 304 requests by the client or _shift_ reload to get new js. Hm. Unique ids? Does this mean the every page on the site must be purged from the caches to cause all requests to see a new version number? Is there also some pending squid patch to let it jam in a new ID number on the fly for every request? Or have I misunderstood what this does? ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Minify
Aryeh Gregor wrote: Any given image is not included on every single page on the wiki. Purging a few thousand pages from Squid on an image reupload (should be rare for such a heavily-used image) is okay. Purging every single page on the wiki is not. yea .. we are just talking about adding image.jpg?image_revision_id to all the image src at page render time should never purge everything on the wiki ;) No. We don't purge Squid on these events, we just let people see old copies. Of course, this doesn't normally apply to registered users (who usually [always?] get Squid misses), or to pages that aren't cached (edit, history, . . .). oky thats basically what I understood. That makes sense.. although it would be nice to think about a job or process that purges pages with outdated language msg, or pages that are referencing outdated scripts, style-sheet, or image urls. We ~do~ add jobs to purge for template updates. Are other things like language msg code updates candidates for job purge tasks? ... I guess its not too big a deal to get an old page until someone updates it. --michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Foundation-l] Why don't we re-encode proprietary formats as Ogg?
I am definitely not opposed to adding in that functionality as I have mentioned in the past: see thread: http://www.mail-archive.com/wikitech-l@lists.wikimedia.org/msg00888.html You should take a look at the work Mike Baynton did back in summer of code 07. The issue that we have is both the Bottlenecks you mentioned. Where possible we want to crowd source the transcoding costs and we have a working firefogg support which we can more aggressively push out once firefox 3.5 lands. Essentially wikimedia commons is not designed to support hosting the raw footage. Other like-minded organizations like archive.org that have peta-bytes of storage across thousands of storage nodes are better positioned to act as a host for raw footage and its derivatives. Additional commons is a strict archive where files not licensed properly often get removed where archive.org can act as a more permanent storage space while license issues are sorted out. Since wikimedia projects will shortly be supporting linline searching time segment grabbing from archive.org material its maybe not so critical that we create and host the transcoding infrastructure ourselves. Although as you mention it would be nice to support transocoding on wikimedias servers for uploading short clips from cell-phone type cases. --michael David Gerard wrote: [cc'd back to wikitech-l] 2009/6/8 Tim Starling tstarl...@wikimedia.org: It's been discussed since OggHandler was invented in 2007, and I've always been in favour of it. But the code hasn't materialised, despite a Google Summer of Code project come and gone that was meant to implement a transcoding queue. The transcoding queue project was meant to allow transformations in quality and size, but it would also allow format changes without much trouble. Ahhh, that's fantastic, so it is just a Simple Matter of Programming :-D (I'm tempted to bodge something together myself, despite my low opinion of my own coding abilities ;-) ) Start simple. Upload your phone and camera video files! We'll transcode them into Theora and store them. Pick suitable (tweakable) defaults. Get it doing that one job. Then we can think about size/quality transformations later. Sound like a vague plan? Bottlenecks: 1. CPU to transcode with. 2. Disk space for queued video. - d. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] firefogg local encode new-upload branch update.
As you may know I have been working on firefogg integration with mediaWiki. As you may also know the mwEmbed library is being designed to support embedding of these interfaces in arbitrary external contexts. I wanted to quickly highlight a useful stand alone usage example of the library: http://www.firefogg.org/make/advanced.html This Make Ogg link will be something you can send to a person so they can encode source footage to a local ogg video file with the latest and greatest ogg encoders (presently the thusnelda theora encoder vorbis audio). Updates to thusnelda and other free codecs will be pushed out via firefogg updates. For commons / wikimedia usage we will directly integrate firefogg (using that same codebase) You can see an example of how that works on the 'new-upload' branch here: http://sandbox.kaltura.com/testwiki/index.php/Special:Upload ... hopefully we will start putting some of this on testing.wikipedia.org ~soonish ?~ The new-upload branch feature set is quite extensive including the script-loader, jquery javascript refactoring, the new upload-api, new mv_embed video player, add media wizard etc. Any feedback and specific bug reports people can do will be super helpful in gearing up for merging this 'new-upload' branch. For an overview see: http://www.mediawiki.org/wiki/Media_Projects_Overview peace, --michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] more bugzilla components
I would like to request categorization for the media projects to the bug tracker. To get a brief idea of the components getting packaged into the new-upload branch check out: http://www.mediawiki.org/wiki/Media_Projects_Overview I think the large scope of code and the fact that MwEmbed can be used in self-contained mode warrants a high level Product categorization. Something like MwEmbed : The self contained jQuery based javascript library for embedding mediaWiki interfaces: Then components for the library could (presently) include the following: * Add Media Wizard * Firefogg * Clip Edit * Embed Video * Sequence Editor * Timed Text * example usage * js Script-Loader * Themes and Styles * *I also want to report some strangeness with bugzilla. I sometimes get the below error when trying to log in (without restrict to ip checked ) and I occasionally get time-outs when submitting bugs: Undef to trick_taint at Bugzilla/Util.pm line 67 Bugzilla::Util::trick_taint('undef') called at Bugzilla/Auth/Persist/Cookie.pm line 61 Bugzilla::Auth::Persist::Cookie::persist_login('Bugzilla::Auth::Persist::Cookie=ARRAY(0xXX)', 'Bugzilla::User=HASH(0xXX)') called at Bugzilla/Auth.pm line 147 Bugzilla::Auth::_handle_login_result('Bugzilla::Auth=ARRAY(0xXX)', 'HASH(0xX)', 2) called at Bugzilla/Auth.pm line 92 Bugzilla::Auth::login('Bugzilla::Auth=ARRAY(0xX)', 2) called at Bugzilla.pm line 232 Bugzilla::login('Bugzilla', 0) called at /srv/org/wikimedia/bugzilla/relogin.cgi line 192 peace, michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] request for feedback on new-upload branch
The new-upload branch includes a good set of new features is available here: http://svn.wikimedia.org/svnroot/mediawiki/branches/new-upload/phase3/ Major Additions: * action=upload added to the api * Supports New upload Interfaces (dependent on mv_embed / jQuery libs ) ** supports upload over http with reporting progress to user. ** support for chunks upload with progress indicators and client side transcoding for videos (chunks for other large flies types almost there) (dependent on the firefogg extension) ** supports upload-api error msg lookup and report back. To test the new upload /interfaces/ you need the latest copy of mv_embed / add_media_wizard.js You can get by checking out: http://svn.wikimedia.org/svnroot/mediawiki/trunk/extensions/MetavidWiki/skins/ (add then adding something like the following to localSettings.php : $wgExtensionFunctions[] = 'addMediaWizard'; function addMediaWizard(){ global $wgOut; $utime = time(); $wgOut-addScript( script type=\{$wgJsMimeType}\ src=\http://localhost/{path_to_skins_dir}/add_media_wizard.js?urid={$utime}\;/script ); } Comments or references to new bugs can be reported to the following bug 18563 which is tracking its inclusion. == Things already on the TOOD list == * Deprecate upload.js (destination checks, filename checks etc) in favor of the jQuery / mv_embed style upload interface. (this is dependent on getting the script-loader branch into the trunk bug 18464 (along with base mv_embed / jQuery libs) ) will have a separate email with call for feedback on that branch shortly once I finish up the css style sheet grouping and mv_embed lib merging. --mv_embed upload interfaces-- * Support pass though mode for firefogg (passthough mode only in development branches of firefogg ... should be released soon) * Support remote iframe driver ( for uploading to commons with progress indicators while editing an article on another site (like wikipedia) ) (will be an upload tab in the add_media_wizard) peace, michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google Summer of Code: accepted projects
Roan Kattouw wrote: The problem here seems to be that thumbnail generation times vary a lot, based on format and size of the original image. It could be 10 ms for one image and 10 s for another, who knows. yea again if we only issue the big resize operation on initial upload with a memory friendly in-place library like vips I think we will be oky. Since the user just waited like 10-15 minutes to upload their huge image waiting an additional 10-30s at that point for thumbnail and instant gratification of seeing your image on the upload page ... is not such a big deal. Then in-page use derivatives could predictably resize the 1024x786 ~or so~ image in realtime again instant gratification on page preview or page save. Operationally this could go out to a thumbnail server or be done on the apaches if they are small operations it may be easier to keep the existing infrastructure than to intelligently handle the edge cases outlined. ( many resize request at once, placeholders, image proxy / deamon setup) AFAICT this isn't about optimization, it's about not bogging down the Apache that has the misfortune of getting the first request to thumb a huge image (but having a dedicated server for that instead), and about not letting the associated user wait for ages. Even worse, requests that thumb very large images could hit the 30s execution limit and fail, which means those thumbs will never be generated but every user requesting it will have a request last for 30s and time out. Again this may be related to the use of unpredictable memory usage of image-magic when resizing large images instead of a fast memory confined resize engine, no? ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google Summer of Code: accepted projects
Aryeh Gregor wrote: I'm not clear on why we don't just make the daemon synchronously return a result the way ImageMagick effectively does. Given the level of reuse of thumbnails, it seems unlikely that the latency is a significant concern -- virtually no requests will ever actually wait on it. ( I basically outlined these issues on the soc page but here they are again with at bit more clarity ) I recommended that the image daemon run semi-synchronously since the changes needed to maintain multiple states and return non-cached place-holder images while managing updates and page purges for when the updated images are available within the wikimedia server architecture probably won't be completed in the summer of code time-line. But if the student is up for it the concept would be useful for other components like video transformation / transcoding, sequence flattening etc. But its not what I would recommend for the summer of code time-line. == per issues outlined in bug 4854 == I don't think its a good idea to invest a lot of energy into a separate python based image daemon. It won't avoid all problems listed in bug 4854 Shell-character-exploit issues should be checked against anyway (since not everyone is going to install the daemon) Other people using mediaWiki won't add a python or java based image resize and resolve dependency python or java component libraries. It won't be easier to install than imagemagick or php-gd that are repository hosted applications and already present in shared hosting environments. Once you start integrating other libs like (java) Batik it becomes difficult to resolve dependencies (java, python etc) and to install you have to push out a new program that is not integrated into all the application repository manages for the various distributions. Potential to isolate CPU and memory usage should be considered in the core medaiWiki image resize support anyway . ie we don't want to crash other peoples servers who are using mediaWiki by not checking upper bounds of image transforms. Instead we should make the core image transform smarter maybe have a configuration var that /attempts/ to bind the upper memory for spawned processing and take that into account before issuing the shell command for a given large image transformation with a given sell application. == what would probably be better for the image resize efforts should focus on === (1) making the existing system more robust and (2) better taking advantage of multi-threaded servers. (1) right now the system chokes on large images we should deploy support for an in-place image resize maybe something like vips (?) (http://www.vips.ecs.soton.ac.uk/index.php?title=Speed_and_Memory_Use) The system should intelligently call vips to transform the image to a reasonable size at time of upload then use those derivative for just in time thumbs for articles. ( If vips is unavailable we don't transform and we don't crash the apache node.) (2) maybe spinning out the image transform process early on in the parsing of the page with a place-holder and callback so by the time all the templates and links have been looked up the image is ready for output. (maybe another function wfShellBackgroundExec($cmd, $callback_function) (maybe using |pcntl_fork then normal |wfShellExec then| ||pcntl_waitpid then callback function ... which sets some var in the parent process so that pageOutput knows its good to go) | If operationally the daemon should be on a separate server we should still more or less run synchronously ... as mentioned above ... if possible the daemon should be php based so we don't explode the dependencies for deploying robust image handling with mediaWiki. peace, --michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Skin JS cleanup and jQuery
hmm right... The idea of the scriptLoader is we get all our #1 included javascript in a single request. So we don't have round trips that would benefit as much from lazy loading so no need to rewrite stuff that is included that way already. I don't think we are proposing convert all scripts to #2 or #3 loading... We already have the importScriptURI function which script use for loading when not using #1. I do suggest we move away from importScriptURI to something like the doLoad function in mv_embed ... that way we can load multiple js files in a single request using the mediaWiki scriptServer (if its enabled). Right now all the importScriptURI stuff works non-blocking and included scripts need to include code to execute anything they want to run. To make things more maintainable and modular we should transition to objects/classes providing methods which can be extended and autoloaded rather than lots of single files doing lots of actions on the page in a less structured fashion. But there is no rush to transition as the scripts are working as is and the new infrastructure will work with the scripts as they are. But the idea of the new infrastructure is to support that functionality in the future... --michael Sergey Chernyshev wrote: No, my link is about 3 ways of loading: 1. Normal script tags (current style) 2. Asynchronous Script Loading (loading scripts without blocking, but without waiting for onload) 3. Lazyloading (loading script onload). Number 2 might be usable as well. In any case changing all MW and Extensions code to work for #2 or #3 might be a hard thing. Thank you, Sergey -- Sergey Chernyshev http://www.sergeychernyshev.com/ On Wed, Apr 22, 2009 at 1:21 PM, Michael Dale md...@wikimedia.org wrote: The mv_embed.js includes a doLoad function that matches the autoLoadJS classes listed in mediaWiki php. So you can dynamically autoload arbitrary sets of classes (js-files in the mediaWiki software) in a single http request and then run something once they are loaded. It can also autoload sets of wiki-titles for user-space scripts again in a single request grouping, localizing, gziping and caching all the requested wiki-title js in a single request. This is nifty cuz say your script has localized msg. You can fill these in in user-space MediaWiki:myMsg then put them in the header of your user-script, then have localized msg in user-space javascript ;) .. When I get a chance I will better document this ;) But its basically outlined here: http://www.mediawiki.org/wiki/Extension:ScriptLoader The link you highlight appears to be about running stuff once the page is ready. jQuery includes a function $(document).ready(function(){ //code to run now that the dom-state is ready }) so your enabled gadget could use that to make sure the dom is ready before executing some functions. (Depending on the type of js functionality your adding it /may/ be better to load on-demand once a new interface component is invoked rather than front load everything. Looking at the add-media-wizard gadget on testing.wikipedia.org for an idea of how this works. peace, --michael Sergey Chernyshev wrote: Yep, with jQuery in the core it's probably best to just bundle it. There is another issue with the code loading and stuff - making JS libraries call a callback function when they load and all the functionality to be there instead of relying on browser to block everything until library is loaded. This is quite advance thing considering that all the code will have to be converted to this model, but it will allow for much better performance when implemented. Still it's probably Phase 5 kind of optimization, but it can bring really good results considering JS being the biggest blocker. More on the topic is on Steve Souders' blog: http://www.stevesouders.com/blog/2008/12/27/coupling-async-scripts/ Thank you, Sergey -- Sergey Chernyshev http://www.sergeychernyshev.com/ On Wed, Apr 22, 2009 at 12:42 PM, Brion Vibber br...@wikimedia.org wrote: On 4/22/09 9:33 AM, Sergey Chernyshev wrote: Exactly because this is the kind of requests we're going to get, I think it makes sense not to have any library bundled by default, but have a centralized handling for libraries, e.g. one extension asks for latest jQuery and latest YUI and MW loads them, another extension asks for jQuery only and so on. Considering we want core code to be able to use jQuery, I think the case for bundling it is pretty strong. :) -- brion ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l