Re: [Wikitech-l] Non-latin characters broken in donation comments
mizusumashi schreef: By the way, I sent some mails to ML wikitech-l. But they are not in the Archive. Why? Mails don't always show up immediately. Also, the archives are grouped per month, so you may have been trying to find e-mails sent in late November in the December archives. Roan Kattouw (Catrope) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] The never-dying topic: category intersection (been there done that .. to the power of three)
We had a pretty lengthy discussion about this before the summer, and the consensus seemed to be that a fulltext-based approach looked most viable. I actually wrote an extension that does that, and promised to release it soon; that was quite a few months ago, and I never got around to it. I'll release it properly when I have time, which will hopefully be before Christmas :D The code needs some tweaking and refactoring, though. It's pretty tightly integrated with the article text search (both functions in one form) and has all kinds of weird features, because the guy who paid me to write it wanted them. It also doesn't support three-letter word searching (which core does these days, using a prefix hack), which is pretty bad since categories with short titles (or stopword titles) won't be found either. Roan Kattouw (Catrope) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] The never-dying topic: category intersection (been there done that .. to the power of three)
We had a pretty lengthy discussion about this before the summer, and the consensus seemed to be that a fulltext-based approach looked most viable. So how does this take care of deep indexing non-atomic categories? =How will this extension be even remotely useful for let's say commons? This discussion is far from over. The basic problems are _not_ solved. I'm sure this thread will die out soon. Half of the participants will again be soothed by the promise of some easy solution just barely beyond the horizon, while the half that realizes that said solution _cannot possibly work_ without a radical reform of the category system will again be too annoyed (I'm getting there already) to continue discussing. Deja vue... -- [[en:User:Dschwen]] [[de:Benutzer:Dschwen]] [[commons:User:Dschwen]] ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] The never-dying topic: category intersection (been there done that .. to the power of three)
2008/12/3 Daniel Schwen [EMAIL PROTECTED]: I'm sure this thread will die out soon. Half of the participants will again be soothed by the promise of some easy solution just barely beyond the horizon, while the half that realizes that said solution _cannot possibly work_ without a radical reform of the category system will again be too annoyed (I'm getting there already) to continue discussing. If the machinery is in place to replace the present ridiculous sub-sub-sub-categories with something that *does their job just as well*, they'll die in quite reasonable order. If the machinery can't completely replace them without editor pain, it'll fail. If it can, it won't and Commons will be ENORMOUSLY happy 'cos we can then go wild treating cats like tags! - d. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] The never-dying topic: category intersection (been there done that .. to the power of three)
Daniel Schwen schreef: We had a pretty lengthy discussion about this before the summer, and the consensus seemed to be that a fulltext-based approach looked most viable. So how does this take care of deep indexing non-atomic categories? Err.. what? Please explain what you mean by that. =How will this extension be even remotely useful for let's say commons? Without addressing Commons in particular, having an efficient way to get pages in the intersection of multiple categories would allow wikis to delete a category such as [[Category:Deceased Presidents of the United States]] and replace it by, say, [[Intersection:Deceased Presidents of the United States]], which would list all articles in [[Category:Deceased people]] and [[Category:Presidents of the United States]]. My extension alone doesn't make that possible, but it makes implementing such a feature considerably easier. This discussion is far from over. The basic problems are _not_ solved. Would you care to elaborate on what those unsolved problems are? I'm sure this thread will die out soon. Half of the participants will again be soothed by the promise of some easy solution just barely beyond the horizon, while the half that realizes that said solution _cannot possibly work_ without a radical reform of the category system will again be too annoyed (I'm getting there already) to continue discussing. It would be nice if you didn't judge people as naive rightaway. Roan Kattouw (Catrope) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] The never-dying topic: category intersection (been there done that .. to the power of three)
On Wed, Dec 3, 2008 at 10:59 AM, Daniel Schwen [EMAIL PROTECTED] wrote: So how does this take care of deep indexing non-atomic categories? =How will this extension be even remotely useful for let's say commons? That's a social problem, and so of secondary importance. Once a technical mechanism exists for solving the problem given a particular type of categories, recategorization will happen, sooner or later. If you think people will flat-out refuse to move to a new, better system, I think you're mistaken: look at the completeness of the move from lists to categories, for instance, when categories were first introduced. (Lists are still used, but in most cases only where they do things that categories currently cannot.) The same goes for all the other useful technical innovations that get introduced. All it would take is running some bots for a while to switch to the better system, not a big cost for a large wiki like Commons with plenty of bot operators. On a technical level, dealing with non-atomic categories is a much bigger pain than dealing with atomic ones. On a social level, on the other hand, they're equally doable, as dewiki shows. There will be transition costs for wikis that have a large body of non-atomic categories, but those will be one-time only. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] The never-dying topic: category intersection (been there done that .. to the power of three)
the other useful technical innovations that get introduced. All it would take is running some bots for a while to switch to the better system, not a big cost for a large wiki like Commons with plenty of bot operators. I'd like for you to be right. But switching from the present category system to atomic categories is not as straight forward as having a few bots run over all existing cats. It will require an enormous amount of work. And so far I have not met willingness to change anything. Greg has shown a long time ago that fast category intersection is doable, but the echo has been pretty much zip, nada. Just note that simply replacing a category with all of it super categories is a dead end. You wouldn't believe the twists and turns in the category tree. Amusing example have been posted on this list already. So, yeah, sorry for my tone. I've pretty much kept my cool for the last N incarnations of this debate, but after repeating all the arguments for atomic cats and intersections and seeing zero improvement I'm getting a little frustrated. Call it empiric evidence rather than assuming people to be naive ;-) -- [[en:User:Dschwen]] [[de:Benutzer:Dschwen]] [[commons:User:Dschwen]] ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] The never-dying topic: category intersec tion (been there done that)
Aryeh Gregor [EMAIL PROTECTED] writes: On Tue, Dec 2, 2008 at 11:01 AM, Daniel Schwen [EMAIL PROTECTED] wrote: So we have shown multiple times now that cat intersection is technically feasible. What we nee now is massive lobbying for atomic categorisation. THAT is the hurdle right now IMO. Not some SQL queries. I'd say that what we need is someone to add proper support for this to the core software and get it enabled on Wikimedia sites, actually. A toolserver tool is just not the same as having the feature integrated into the software, in terms of usage levels. It might be that the implementations written so far are not efficient enough for enabling on Wikimedia, but nobody with commit access has even tried. I'm with you - we've shown feasibility in large datasets with a lucene based approach, and I think we need to roll it out and test it with real users on real data. We need a new lucene index and a user interface (needs to be defined) suitable for average users to find useful. I'm thinking of a browse related categories type of function. Best Regards, Aerik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] The never-dying topic: category intersection (been there done that)
2008/12/3 Aerik [EMAIL PROTECTED]: I'm with you - we've shown feasibility in large datasets with a lucene based approach, and I think we need to roll it out and test it with real users on real data. We need a new lucene index and a user interface (needs to be defined) suitable for average users to find useful. I'm thinking of a browse related categories type of function. Write something the Commons cabal(tm) will love and you'll be most rewarded with joy and happy users and stuff. - d. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] The never-dying topic: category intersection
On Wed, Dec 3, 2008 at 12:37 PM, Aerik Sylvan [EMAIL PROTECTED] wrote: [snip] But it sounds like maybe those of us who'd like to see this happen should discuss a UI (or several) for it. I was thinking the most intuitive interface was a sort of browse type function, where for any given group of categories (could just be one category), you have two result sets: related categories (other categories of pages in the starting category), and articles at the intersection of the group. The articles are what we generally think of, but the related categories gives us an intuitive way to navigate through category intersections. The articles in the group of categories are the problem we've already solved (mostly): they are the result from the fulltext or lucene search. The related categories problem is harder, [snip] So an interface I had that was really pleasing was that I asked the database to find a random subset of the results, which it could do quickly, (or I used the whole results if the initial query contained them) and I found the set of categories which maximally bisected the result and presented the list with a set of +/- buttons. I.e. you search for Animal and you'd get: Mammal[+/-] Reptile[+/-] Kittens[+/-] Taken with Canon Camera[+/-] Human[+/-] based on the how close to 50% of the results have the suggested category. It's not exactly a 'related category', but I thought it was very useful. I also did a fuzzy text matching search one the category names using a trigram index, so it was always sure to suggest Category:Cats when you searched for Cat, or whatever. (I did this with an ajaxy-search-while you type, it was handy) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Non-latin characters broken in donation comments
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 mizusumashi wrote: I see that some (maybe all) Japanese names are correctly displayed. I am very glad thanks to your work. Yay! But I have a very few dissatisfaction. Surname are displayed after personal name. As you know, in east Asia we write surname and personal name in this order. Hmm... we'll see if we get a display ordering or if we can arrange something else nice... - -- brion -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.8 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkk20p4ACgkQwRnhpk1wk47PiACffU8uMAVuVtzLz+xfTUJ3u42N dkgAn3ggd6bxxcD9wBsVjoSaObwWQe9w =GuxA -END PGP SIGNATURE- ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Stanton Foundation $890K Usability Grant
As per Michael's earlier e-mail: http://wikimediafoundation.org/wiki/Press_releases/Wikipedia_to_become_more_user-friendly_for_new_volunteer_writers We're very grateful to the Stanton Foundation for this important investment in Wikipedia's user-friendliness. We're aware of the UNICEF research as well and we'll survey the existing improvements as part of this project. A few points beyond the press release: '''When will this project begin, and when will it finish?''' The project will begin in January 2009. It will wrap up April 2010. '''What is its overall scope?''' The project scope will include the following: * user testing designed to identify the most common barriers to entry for first-time writers, and * a series of improvements to the MediaWiki interface, including improvements to issues identified through user testing and a focus on hiding complex elements of the user interface from people who don't use them. (Specifically, we'll focus on complex syntax like templates, references, tables, etc.) '''What does the Wikimedia Foundation consider to be wrong with the editing interface right now?''' When it was first developed, MediaWiki was considered reasonably user-friendly. At that time, software wasn't as flexible and user-focused as it is today. It's logical that by today's standards, MediaWiki may not seem to be as streamlined or user-friendly as other software. We have never systematically examined the editing interface to examine what kinds of challenges new contributors face, but we do know of certain common problems. For example, many people have difficulty creating new articles, uploading images, and editing templates, footnotes, and tables. We hope to make improvements in those areas. '''Who are the new contributors you are hoping to attract?''' We are hoping to attract new contributors who are just as smart and knowledgeable as the people who have always written for Wikipedia and its sister projects, but who -to date- have been unable or reluctant to participate because of the barriers posed by the interface. There are countless individuals who read Wikipedia and would be great writers/editors, but are daunted by complex wiki syntax. They may not even realize that they can edit Wikipedia. They are the people we are targeting with this project. '''What is the nature of the interface improvements that will be made in this project?''' In phase 1 (until late summer 2009), we will focus on reducing or eliminating common, simple barriers to entry. A possible example would be, making the edit button more visible. These will be identified through systematic user testing, but also by surveying existing research. In phase 2 (until early 2010), we will shift our attention to identifying complex pieces of wiki code (the formatting language used to write Wikipedia articles) and making them less visible to first-time contributors and/or helping them achieve the respective functionality (such as adding tables) more easily. '''When can we expect to see the first changes to the Wikipedia interface?''' We hope to demonstrate a first series of improvements by mid-2009, with production deployment following shortly thereafter. '''How can the Wikimedia volunteer community be involved in this project?''' The project will be open and participatory throughout. Every major report will be publicly shared, and all code will be developed through our existing, public version control system. Volunteer developers and testers will be encouraged to contribute throughout the process. '''Are the positions created for this project just temporary?''' We will allocate at least two existing, budgeted developer positions to this project, and additional hires will be employed for the duration of the grant. '''Why don't these funds count towards your overall fundraising goals?''' The majority of the funding for this project will go towards costs not included in our 2008-09 budget. While we anticipate that the project will offset some of our operating costs, we also want to retain flexibility to reallocate funding inside the project budget as required. '''Are you going to localize these changes in all the languages of Wikipedia and the other projects?''' All code will be ready for internationalization. '''Are you going to be looking at the entire editing/contribution process or just the software?''' This project focuses on technical solutions, but the user testing will aim to capture problems experienced throughout the editing process. -- Erik Möller Deputy Director, Wikimedia Foundation Support Free Knowledge: http://wikimediafoundation.org/wiki/Donate ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Non-latin characters broken in donation comments
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Brion Vibber wrote: mizusumashi wrote: I see that some (maybe all) Japanese names are correctly displayed. I am very glad thanks to your work. Yay! But I have a very few dissatisfaction. Surname are displayed after personal name. As you know, in east Asia we write surname and personal name in this order. Hmm... we'll see if we get a display ordering or if we can arrange something else nice... Ok, quick summary: 1) PayPal sends us a payment record with 'first_name' and 'last_name' fields. 2) We insert that record into our CiviCRM database. 3) CiviCRM combines the first name and last name into a display name... per standard Western ordering assumptions. 4) The display name is copied into our public reporting database and shown on the web. It looks like we can't do much about the name split in 1); that's just what we get out of the payment processor. We may be able to fudge things at step 3) by detecting Han characters and producing a properly-sorted display name, at least for that case. Of course this will still be wrong for Hungarians, and Romanized Japanese names may often get written either way... - -- brion -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.8 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkk21moACgkQwRnhpk1wk47rgACg31a0iArCTSyHfQ/Sutv4zorh wjYAni4MbNRDwgtQderCNvGjnQziGGM5 =0p5I -END PGP SIGNATURE- ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Non-latin characters broken in donation comments
Bence Damokos schreef: Thank you for considering Hungarian. You could detect Hungarians by simply looking for donations in Hungarian Forints (HUF). Note that not all people who live in Hungary have Hungarian names, and not all Hungarians live in Hungary. Roan Kattouw (Catrope) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Non-latin characters broken in donation comments
On Wed, Dec 3, 2008 at 10:01 PM, Roan Kattouw [EMAIL PROTECTED] wrote: Bence Damokos schreef: Thank you for considering Hungarian. You could detect Hungarians by simply looking for donations in Hungarian Forints (HUF). Note that not all people who live in Hungary have Hungarian names, and not all Hungarians live in Hungary. As there are no such data released (you can't filter donations by currency, or even better currency+location) so I'm just guessing that those donating in forints are mostly (~100%) Hungarians, while there is no easy way to find the Hungarians among those not donating in forints. I didn't want to elaborate on this in my previous mail, but as long as the surname - first name order is not considered wrong, strange or out of place in the context of English, and possibly other languages, than using this order would be a win - win (it would be still acceptable on the English/other interfaces, and on the Hungarian interface it would be correct). However, most Hungarians themselves use the Western order to name themselves in English (and I guess in most foreign languages and contexts) so the Western order would be correct on every interface language (except possibly in those countries that use the non-Western order) except Hungarian (but I dare say that people don't/wouldn't mind it, as they understand that the context is mostly English [website of an American foundation, even the currencies look 'foreign']). In conclusion, I would let the Hungarians' name's rest for this year :). Unfortunately we get the name already divided up from PayPal and are stuck either guessing or making an unattractive 'Surname, Given' display which looks bad for everyone. :( You have a box for comments, that is independent from the PayPal people. Maybe a solution would be to have 3 options instead of two at the privacy checkbox: Display my name [default], Anonymous donation, Display a custom name [this could work possibly for donating in someone other's name, if that's not a privacy concern]. -- Bence Damokos (Damokos Bence in Hungary) Roan Kattouw (Catrope) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Non-latin characters broken in donation comments
Unfortunately we get the name already divided up from PayPal and are stuck either guessing or making an unattractive 'Surname, Given' display which looks bad for everyone. :( There is something to be said for annoying everyone equally. Being an international organisation is very important for the foundation, it may well be worth annoying (non-Hungarian) westerners unnecessarily in order to show that we're not favouring any nationalities over others. (This is all assuming people that use the Surname-Given name order will actually care - they may all be so used to having their names mangled that they barely notice anymore. A little market research may be called for.) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Non-latin characters broken in donation comments
(long, complex solutions to guess the right display) Why not have a Show Name, Surname / Show Surname, Name option on the donation display? Easy, consistent, and everybody should be happy with it. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Non-latin characters broken in donation comments
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Platonides wrote: (long, complex solutions to guess the right display) Why not have a Show Name, Surname / Show Surname, Name option on the donation display? Easy, consistent, and everybody should be happy with it. Because it would show everything wrong? :) - -- brion -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.8 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkk3F1QACgkQwRnhpk1wk46rmACeMuL9sy6yc7yGw7K+9s4QWd/S 0PYAoJRYIQs93H9gLMbSsgN0JmhywsK5 =AyQs -END PGP SIGNATURE- ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] The never-dying topic: category intersection (been there done that .. to the power of three)
On Wed, Dec 3, 2008 at 11:43 AM, Daniel Schwen [EMAIL PROTECTED] wrote: I'd like for you to be right. But switching from the present category system to atomic categories is not as straight forward as having a few bots run over all existing cats. Of course, humans would have to manually specify which new categories each old one corresponds to, but that's a perfectly doable job for a small group of volunteers working over the course of months. The bots would do the much more tedious work of actually replacing them, so each category could take substantially less than a minute of human review. The category intersection feature would then get incrementally more useful as the work progressed. It will require an enormous amount of work. And so far I have not met willingness to change anything. Greg has shown a long time ago that fast category intersection is doable, but the echo has been pretty much zip, nada. There's a world of difference between showing that something is feasible in theory, and making it a core part of the software that's visible on every category page on every Wikimedia wiki without asking for community consensus in advance. As soon as people actually start using the feature, and they will if there's a box on every category page, they'll realize that it would be way more useful if they changed how things are categorized. As long as category intersections remain vaporware, there's no incentive to change. A technical fait accompli will bring about change. Even if Commons hypothetically didn't go along with the scheme, it would be valuable to have it in the software anyway. Plenty of wikis could still use it, like dewiki. We need an interface and we need a backend and we need someone to hook them together and commit them to Subversion. People have spent too much time inventing and reinventing and re-reinventing new and different but basically interchangeable backends, and too little time on the other parts of the problem. If the feature were committed to the software with a completely brainless backend unusable on Wikimedia wikis, I predict it would be live on all sites in less than six months. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] The never-dying topic: category intersection (been there done that .. to the power of three)
how things are categorized. As long as category intersections remain vaporware, there's no incentive to change. A technical fait accompli will bring about change. Uhm, yeah.. except that intersection of atomic categories are not vaporware. We had proofs of concept for that and the interest was marginal. In any case. If someone would really just shoved it into mw core and enabled it on all the wmf sites I'd be happy. I concur that it would make the job convincing useres of a less retarded categorization scheme a bit easier. As far as Aeriks soapboxing from a few emails back goes: Let's not kid ourselves, tag based categorization is standard on commercial sites such as stockphotography libraries. We are not exactly inventing this... I'll shut up now, and I really hope that this is the last time we're having this discussion... (but boy, you will get an earfull if it isn't ;-) ) -- [[en:User:Dschwen]] [[de:Benutzer:Dschwen]] [[commons:User:Dschwen]] ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] All wikipedia text less than 500 MB compressed?
From CNET interview to Brion http://news.cnet.com/8301-17939_109-10103177-2.html The text alone is less 500 MB compressed. That statement struck me, as I wouldn't think that big wikis could fit on that, much less all wikis. So I went and spent some CPU on calculations: I first looked at dewiki: $ 7z e -so dewiki-20081011-pages-meta-history.xml.7z|sed -n 's/\s*text xml:space=preserve\([^]*\)\(\/text\)\?/\1/gp'| bzip2 -9 | wc -c 325915907 bytes = 310.8 MB Not bad for a 5.1 GB 7z file. :) Then I to enwiki, begining with the current versions: $ bzcat enwiki-20081008-pages-meta-current.xml.bz2|sed -n 's/\s*text xml:space=preserve\([^]*\)\(\/text\)\?/\1/gp'|bzip2 -9 | wc -c 253648578 253648578 bytes = 241.898 MB Again, a gigantic file (7.8 GB bz2) was reduced to less than 500MB. Maybe it *can* be done after all. There're much more revisions, but the compression ratio is greater. So I had to go to turn to the beast, enwiki history files. As there hasn't been any successful enwiki history dump on the last months, I used an old dump I had, which is nearly a year old and fills 18G. $ 7z e -so enwiki-20080103-pages-meta-history.xml.7z |sed -n 's/\s*text xml:space=preserve\([^]*\)\(\/text\)\?/\1/gp'|bzip2 -9 | wc -c 1092104465 bytes = 1041.5 MB = 1.01 GB So, where did those 'less than 500MB' numbers came from? Also note that I used bzip2 instead of gzip, so external storage will be using much more space (plus indexes, ids...). Nonetheless, the results are impressive on how the size of *already compressed files* get reduced just by reducing the metadata. As a comparison, dewiki-20081011-stub-meta-history.xml.gz containing the remaining metadata is 1.7GB. 1.7 GB + 310.8 MB is still much less than the 51.4 GB of dewiki-20081011-pages-meta-history.xml.bz2! Maybe we should investigate new ways of storing the dumps compressed. Could we achieve similar gains increasing the bzip window size to counteract the noise of revision metadata? Or perhaps I used a wrong regex and thus large chunks of data were not taken into account ? ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Non-latin characters broken in donation comments
Brion Vibber wrote: Platonides wrote: (long, complex solutions to guess the right display) Why not have a Show Name, Surname / Show Surname, Name option on the donation display? Easy, consistent, and everybody should be happy with it. Because it would show everything wrong? :) -- brion Why? West names would be shown with the 'wrong' order when viewed with the East setting, and viceversa. But it'd be a client setting, so anyone can view the list on the order which fits him most. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] The never-dying topic: category intersection (been there done that .. to the power of three)
On Wed, Dec 3, 2008 at 8:12 PM, David Gerard [EMAIL PROTECTED] wrote: The last time will be when there's a feature end-users can use without going off to the toolserver. With a JS hack I had my tool integrated to the site. The AJAX calls went to the toolserver, but as far as the users could see it was running on the site. No one cared: It didn't produce useful results because of how categories are used, and when I suggested changing people just waved their arms at me just make it walk the tree. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] The never-dying topic: category intersection (been there done that .. to the power of three)
Gregory Maxwell wrote: With a JS hack I had my tool integrated to the site. The AJAX calls went to the toolserver, but as far as the users could see it was running on the site. No one cared: It didn't produce useful results because of how categories are used, and when I suggested changing people just waved their arms at me just make it walk the tree. That _is_ curious. When did this happen? It seems I also blinked and missed it. -- Ilmari Karonen ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] The never-dying topic: category intersection
Gregory Maxwell wrote: So an interface I had that was really pleasing was that I asked the database to find a random subset of the results, which it could do quickly, (or I used the whole results if the initial query contained them) and I found the set of categories which maximally bisected the result and presented the list with a set of +/- buttons. I.e. you search for Animal and you'd get: Mammal[+/-] Reptile[+/-] Kittens[+/-] Taken with Canon Camera[+/-] Human[+/-] based on the how close to 50% of the results have the suggested category. It's not exactly a 'related category', but I thought it was very useful. Wow! And this was at some point live, directly on the Commons category pages?! Has the whole thing been scrapped since, or is there some way to still try it out, e.g. by installing some custom JavaScript? -- Ilmari Karonen ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] The never-dying topic: category intersection
Aerik Sylvan wrote: But it sounds like maybe those of us who'd like to see this happen should discuss a UI (or several) for it. I was thinking the most intuitive interface was a sort of browse type function, where for any given group of categories (could just be one category), you have two result sets: related categories (other categories of pages in the starting category), and articles at the intersection of the group. The articles are what we generally think of, but the related categories gives us an intuitive way to navigate through category intersections. Another useful feature, which would probably make the system much more likely to be adopted in practice, would be an easy interface to get from articles (or images, etc.) to various relevant intersections. For example, if I'm looking at an image which is in the categories Maple, Leaves and Green, I should be able to easily get to pages where I can browse other pictures of either maple leaves or green leaves, not to mention other pictures of green maple leaves. A _minimal_ solution would be simply to present a link to the intersection of _all_ the categories (which might well have only one page on it) and let the user broaden the intersection from there. Even better if this can be done in an AJAXish way directly on the image page itself, though obviously some fallback interface would still be needed for users without JavaScript. -- Ilmari Karonen ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l