Re: [Wikitech-l] File licensing information support

2011-01-30 Thread Bryan Tong Minh
Hi,


There have been a lot of mails since I last had the the time to reply,
so I'll reply to some points in a single mail.

On Sat, Jan 22, 2011 at 9:04 PM, Platonides platoni...@gmail.com wrote:
 An internally handled parser function doesn't conflict with showing it
 as a textbox.

 We could for instance store it as a hidden page prefix.

No. I strongly feel that using the wikitext to store hidden metadata
is a bad idea. See HM's reply later in the thread.
 Eeeww

 What's any different between this and a {{#author: }} parser function apart
 from the inability to access it from the wikitext?  As noted, it's perfectly
 possible for the data to be in a separate field on the upload form, either
 by default or by per-wiki hackery.  This is likely to result in as many why
 can't I edit the bits of wikitext which diff, history, transclusion (let's
 not forget the enormous can of worms mucking around with the wikitext will
 open up there), etc assure me is there?? questions as it solves what does
 this brace structure do? ones.

 --HM


 PS: The field author would be just a pointer to the author page, so you
 wouldn't need to edit everything on any case.

A good point, {{#fileauthor:}} could indeed just point to the a page
in the Author: namespace.

Now that I think of it, if we go this way, there is no reason to
restrict this licensing information to Files.

On Sun, Jan 23, 2011 at 1:38 AM, Magnus Manske
magnusman...@googlemail.com wrote:

 Things like {{#author:4}} seem to be a nice hack to Get Things Done
 (TM). As was mentioned before, the temptation is great to expand it
 into a generic triplet storage a la Semantic MediaWiki, but that would
 probably complicate things to an extend where nothing gets done,
 again.

SMW may perhaps be the ultimate solution, but I do not believe that
activation of SMW is going to happen in the near or mid term feature,
and indeed waiting for SMW will probably mean that nothing is going to
happen.


I think the consensus is that we want to store the copyright metadata
in the wikitext and not separately.

The biggest problem is how to define second-level properties. For
example, a file has a license, say GFDL-1.2 and the license in turn
has a legal URL such as http://fsf.org/gfdl-1.2 or something. This
could be solved by {{#filelicense:GFDL-1.2}} pointing to a license
defined in Special:LicenseManger, with all its properties there.
Another solution would be to define a new namespace such as License:
and have the properties defined in there somehow.
The same problem applies to authors as well of course.


Regards,
Bryan

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] File licensing information support

2011-01-25 Thread Dmitriy Sintsov
* Michael Dale md...@wikimedia.org [Mon, 24 Jan 2011 13:18:00 -0600]:
 We should focus on apis for template editing,
 Extension:Page_Object_Model seemed like a step in the right direction
 but not  Something that let you edit structured data across nested
 template objects and we could stack validation ontop of that would let
 us leverage everything that has been done and keep things wide open 
for
 what's done in the future.

 Most importantly we need clean high level apis that we can build GUIs
 on, so that the flexibility of the system does not hurt usability 
and
 functionality.

Michael is correct - API module to extract data from already existing 
nested templates and to replace the data (when needed) probably is the 
only thing that is required to make Wikipedia more structural and 
semantical. Then, the whole collecting and analyzing of triples can be 
off-loaded to externals bots and tools. Great idea, imho.
Dmitriy

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] File licensing information support

2011-01-24 Thread Platonides
Happy-melon wrote:
 Eeeww
 
 What's any different between this and a {{#author: }} parser function apart 
 from the inability to access it from the wikitext?  As noted, it's perfectly 
 possible for the data to be in a separate field on the upload form, either 
 by default or by per-wiki hackery.  This is likely to result in as many why 
 can't I edit the bits of wikitext which diff, history, transclusion (let's 
 not forget the enormous can of worms mucking around with the wikitext will 
 open up there), etc assure me is there?? questions as it solves what does 
 this brace structure do? ones.
 
 --HM

Good point about transclusion.
That question wouldn't be asked since they would be editable above, just
in a different input box than the main content.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] File licensing information support

2011-01-24 Thread Michael Dale
On 01/22/2011 01:15 PM, Bryan Tong Minh wrote:
 Handling metadata separately from wikitext provides two main
 advantages: it is much more user friendly, and it allows us to
 properly validate and parse data.

This assumes wikitext is simply a formatting language, really its a data
storage, structure and presentation language. You can already see this
in place by the evolution of templates as both data and presentation
containers. It seems like a bad idea to move away from leveraging
flexible data properties used in presentation.

In commons for we have Template:Information that links out into numerous
data triples for assets presentation. ( ie Template:Artwork,
Template:Creator,  Template:Book with sub data relationships like
Artwork.Location referencing the Institution template. If tied to SMW
backed you could say give me artwork in room Pavillion de Beauvais at
the louvre, that is missing a created on date.

We should focus on apis for template editing,
Extension:Page_Object_Model seemed like a step in the right direction
but not  Something that let you edit structured data across nested
template objects and we could stack validation ontop of that would let
us leverage everything that has been done and keep things wide open for
what's done in the future.

Most importantly we need clean high level apis that we can build GUIs
on, so that the flexibility of the system does not hurt usability and
functionality.

 Having a clear separate input text field Author:  is much more
 user friendly {{#fileauthor:}}, which is so to say, a type of obscure
 MediaWiki jargon. I know that we could probably hide it behind a
 template, but that is still not as friendly as a separate field. I
 keep on hearing that especially for newbies, a big blob of wikitext is
 plain scary. We regulars may be able to quickly parse the structure in
  {{Information}}, but for newbies this is certainly not so clear.
 We actually see that from the community there is a demand for
 separating the meta data from the wikitext -- this is after all why
 they implemented the uselang= hacked upload form with a separate text
 box for every meta field.

I don't know... see all the templates mentioned above... To be sure, I
think we need better interfaces for interacting with templates.

 Also, a separate field allows MediaWiki to understand what a certain
 input really means. {{#fileauthor:[[User:Bryan]]}} means nothing to
 MediaWiki or re-users, but Author: Bryan___ [checkbox] This is a
 Commons username can be parsed by MediaWiki to mean something. It
 also allows us to mass change for example the author. If I want to
 change my attribution from Bryan to Bryan Tong Minh, I would need
 to edit the wikitext of every single upload, whereas in the new system
 I go to Special:AuthorManager and change the attribution.

A semantic mediwiki like system retains this meaning for mediawiki to
interact with at any stage of data [re]presentation, and of course
supports flexible meaning types.

 Similar to categories, and all otheruser edited metadata.
 Categories is a good example of why metadata does not belong in the
 wikitext. If you have ever tried renaming a category... you need to
 edit every page in the category and rename it in the wikitext. Commons
 is running multiple bots to handle category rename requests.

 All these advantage outweigh the pain of migration (which could
 presumably be handled by bots) in my opinion.

Unless your category was template driven, in which case you just update
the template ;) If your category was instead magically associated with
the page outside of template built wiki page text, how do you build
procedurally build data associations?


--michael

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] File licensing information support

2011-01-24 Thread Krinkle
Before I respond to the recent new ideas, concepts and suggestions.  
I'd like to
explain a few things about the backend (atleast the way it's currently  
planned to be)

The mw_authors table contains unique authors by either a name or a  
userid.
And optionally a custom attribution can be given (fallback to  
authorname, user real_name or user_name)
Also optionally a url can be given (fallback to nothing or userpage).

The mw_license table contains the different licenses a wiki allows to  
be used.
Their canonical name (eg. GFDL, CC-BY-SA-3.0 etc.), url to legal  
code and usage count[1].

mw_file_props is a table that keeps previous versions of file_props as  
well. And is linked to
mw_revision by fp_id in rev_fileprops_id (like mw_text is linked in  
rev_text_id).

Both authors and licenses are uniquely identified by their id. This  
makes it easy to change stuff later on
in an AuthorManager (eg. different url, username change etc.). The  
texts and complete titles of the
licenses are stored in interface messages (for internationalization).  
MediaWiki:License-uniq-text could
for example contain {{Cc-by-sa-3.0|attribution=$2}} on Wikimedia  
Commons.

-

If we store the links in the wikitext (like {{#fileauthor:}} and  
{{#filelicense:}}, the advantages are basically
two things:
1) It has all features of editing and revisioning (better history,  
edit conflict, diff view, etc.)
2) No need for a revisionized mw_file_props, we can store the current  
values in mw_page_props

Possible down side is that a diff like
- {{#fileauthor:2}} {{filelicense:12}}
+ {{#fileauthor:10}} {{#fileauthor:12}} {{#filelicense:
doesn't mean very much. I.m.h.o The solution is not to store the  
actual names in wikitext so that
the diffs are better, but to either not store it in wikitext at all,  
or customize the behaviour everywhere:
* edit form: extract parserfunction calls from wikitext before  
anything else,
and put it in seperate form elements
* diff view: get the names of those authors and licenses and somehow  
include it in the diff view
This could be done a bit like AbuseFilter's diff between filter 
 
versions (ie. before Line 1,
would be Author and License)
* saving form: convert back to {{#parserfunction:}} calls and  
prepending it to wikitext
* action=raw: ?
* action=render: ?
* api-parse: ?
right now I think storing it in wikitext and customizing it everywhere  
like shown above is not worth
the trouble and would likely bring it's own troubles. Keeping it  
seperate from wikitext is more work
once but I think it pays off. But again, nothing is final yet.  
Everything is possible.

--
Krinkle


[1]: The usage count (mw_license.lic_count) is a bit like edit count  
(increased/decreased when saving files)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] File licensing information support

2011-01-24 Thread Platonides
Krinkle wrote:
 Before I respond to the recent new ideas, concepts and suggestions.  
 I'd like to
 explain a few things about the backend (atleast the way it's currently  
 planned to be)
 
 The mw_authors table contains unique authors by either a name or a  
 userid.
 And optionally a custom attribution can be given (fallback to  
 authorname, user real_name or user_name)
 Also optionally a url can be given (fallback to nothing or userpage).
 
 The mw_license table contains the different licenses a wiki allows to  
 be used.
 Their canonical name (eg. GFDL, CC-BY-SA-3.0 etc.), url to legal  
 code and usage count[1].
 
 mw_file_props is a table that keeps previous versions of file_props as  
 well. And is linked to
 mw_revision by fp_id in rev_fileprops_id (like mw_text is linked in  
 rev_text_id).
 
 Both authors and licenses are uniquely identified by their id. This  
 makes it easy to change stuff later on
 in an AuthorManager (eg. different url, username change etc.). The  
 texts and complete titles of the
 licenses are stored in interface messages (for internationalization).  
 MediaWiki:License-uniq-text could
 for example contain {{Cc-by-sa-3.0|attribution=$2}} on Wikimedia  
 Commons.
 
 -
 
 If we store the links in the wikitext (like {{#fileauthor:}} and  
 {{#filelicense:}}, the advantages are basically
 two things:
 1) It has all features of editing and revisioning (better history,  
 edit conflict, diff view, etc.)
 2) No need for a revisionized mw_file_props, we can store the current  
 values in mw_page_props
 
 Possible down side is that a diff like
 - {{#fileauthor:2}} {{filelicense:12}}
 + {{#fileauthor:10}} {{#fileauthor:12}} {{#filelicense:
 doesn't mean very much. I.m.h.o The solution is not to store the  
 actual names in wikitext so that
 the diffs are better, but to either not store it in wikitext at all,  
 or customize the behaviour everywhere:

Why? Storing the property filelicense: GPL directly in wikitext is not
bad. It's also a relief when we want to delete licenses later.
Same with Author. Take that as a key into a NS_AUTHOR namespace.
Going to Special:LicenseManager/5 in order to change GPL license data is
just added complexity over using the short name GPL.



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] File licensing information support

2011-01-23 Thread Dmitriy Sintsov
* Magnus Manske magnusman...@googlemail.com [Sun, 23 Jan 2011 00:38:53 
+]:
 On Sat, Jan 22, 2011 at 10:09 PM, Platonides platoni...@gmail.com
 wrote:
  Krinkle wrote:
  So PHP would extract {{#author:4}} and {{#license:12}} from the
  textblob when showing the editpage.
  And show the remaining wikitext in the textarea and the author/
  license as seperate form elements.
  And upon saving, generate {{#author:4}} {{#license:12}}\n again 
and
  prepend to the textblob.
 
  Double instances of these would be ignored (ie. stripped
 automatically
  since they're not re-inserted to
  the textblob upon saving).
  One small downside would be that if someone would edit the textarea
  manually to do stuff with
  author and license, the next edit would re-arrange them since 
they're
  extracted and re-insterted
  thus showing messy diffs. (not a major point as long as it's done
  independant from JavaScript,
  which it can be if done from core / php).
 
  If that's what you meant, I think it is an interesting concept that
  should not be ignored, however personally
  I am not yet convinced this is the way to go. But when looking at 
the
  complete picture of up/down sides,
  this could be something to consider.
 
  --
  Krinkle
 
  That's an alternative approach. I was thinking in accepting them 
only
 at
  the beginning of the page, but extracting from everywhere is also an
  alternative.


 OK, my 2 cents:

 I would be in favour of extracting data from the {{Information}}
 template via the parser, but we talked about this over a year ago at
 the Paris meeting, and it was deemed too complicated (black caching
 magick etc.), and noone has stepped forward to do anything along those
 line, so I guess it's dead and buried.

 Things like {{#author:4}} seem to be a nice hack to Get Things Done
 (TM). As was mentioned before, the temptation is great to expand it
 into a generic triplet storage a la Semantic MediaWiki, but that would
 probably complicate things to an extend where nothing gets done,
 again.

 But one thing comes to mind: If someone implements an abstraction
 layer (4 to a specific author) anyway, it should be dead simple to
 use it for tags as well. Just allow multiple {{#tag}}s per page (as
 opposed to {{#author}}), done. The same code that will allow for
 editing author and license information centrally should make it
 possible to edit tag information, i18n for example, so the tag display
 could be in the current user language (with en fallback). Search for
 tags i18n-style could be possible as well, if the translation
 information is encoded machine-readable as well, e.g. as language
 links ([[de:Pferd]] on the [[Tag:Horse]] page).

You are correct - triplets definition are always meant to be as much 
generic as possible, something like categorizing or tagging. It is 
better to define them separately from templates and to include the 
references to their values in the template. Such way it would not 
complicate the parsing too much. Although one might want to have a fancy 
visual forms to edit these, that probably brings caching issues?

 It might be too much to try to activate all of that in the first
 round, but IMHO the code should keep the use as tags in mind; it would
 be dreadful to waste such an opportunity.

Dmitriy

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] File licensing information support

2011-01-23 Thread Lars Aronsson
On 01/22/2011 08:15 PM, Bryan Tong Minh wrote:
 Having a clear separate input text field Author:  is much more
 user friendly {{#fileauthor:}}, which is so to say, a type of obscure
 MediaWiki jargon.

I disagree. In real life, there are always more compliated
cases, where an author is not an author, but two authors
or a sculptor, or one painter and one photographer. These
things never fit in a single author field, and the same goes
for any other separated fields. But the free-form Wikipedia
can handle all real-world cases in plain human language.

Various expert systems based on artificial intelligence
existed since the 1980s, but none of them produced a
universal encyclopedia. Only the text-based Wikipedia did.
After this humiliating fact, the same AI people (now dressed
as semantic web scholars) come and claim that they too
could have built Wikipedia, if it only were more structured.
They are wrong, of course. Lack of structure is precisely
what built Wikipedia.


-- 
   Lars Aronsson (l...@aronsson.se)
   Aronsson Datateknik - http://aronsson.se



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] File licensing information support

2011-01-23 Thread Dmitriy Sintsov
* Lars Aronsson l...@aronsson.se [Mon, 24 Jan 2011 07:06:02 +0100]:
 On 01/22/2011 08:15 PM, Bryan Tong Minh wrote:
  Having a clear separate input text field Author:  is much more
  user friendly {{#fileauthor:}}, which is so to say, a type of 
obscure
  MediaWiki jargon.

 I disagree. In real life, there are always more compliated
 cases, where an author is not an author, but two authors
 or a sculptor, or one painter and one photographer. These
 things never fit in a single author field, and the same goes
 for any other separated fields. But the free-form Wikipedia
 can handle all real-world cases in plain human language.

 Various expert systems based on artificial intelligence
 existed since the 1980s, but none of them produced a
 universal encyclopedia. Only the text-based Wikipedia did.
 After this humiliating fact, the same AI people (now dressed
 as semantic web scholars) come and claim that they too
 could have built Wikipedia, if it only were more structured.
 They are wrong, of course. Lack of structure is precisely
 what built Wikipedia.


One may have not just a single triple for that, but the list / set of 
triples for the same person in a different role (different kind of 
author). There are two extremes - not to have any structure or to be 
overly structural. If there are few extra fields for an image 
description, why don't generalize it for all kinds of measured data - 
geographical, historical, population statistics, financial and 
economical data and so on? Why only the images are allowed to have 
structural and measurable data? However, I don't think that Wikipedia 
should have AI, because it requires huge computing power, and the 
problem is that AI algorithms are not efficient enough. To have the data 
structured is not a bad thing. It probably should not even try to do 
SPARQL, but offer these things to external sits. Don't make complex 
queries, leave it for offline tools / bots or toolserver. Semantic bots 
are a good idea - they might mine the data finding the cross-sets. It 
should be even lighter than SMW. However, I might be wrong.
Dmitriy

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] File licensing information support

2011-01-22 Thread Bryan Tong Minh
On Fri, Jan 21, 2011 at 3:36 AM, Michael Dale md...@wikimedia.org wrote:
 On 01/20/2011 05:00 PM, Platonides wrote:
 I would have probably gone by the page_props route, passing the metadata
 from the wikitext to the tables via a parser function.

 I would also say its probably best to pass metadata from the wikitext to
 the tables via a parser function.  Similar to categories, and all other
 user edited metadata. This has the disadvantage that its not easy 'as
 easy' to edit via structured api entry point,  but has the advantage of
 working well with all the existing tools, templates and versioning.

This is actually the biggest decision that has been made, the rest is
mostly implementation details. (Please note that I'm not presenting
you with a fait accompli, it is of course still possible to change
this)

Handling metadata separately from wikitext provides two main
advantages: it is much more user friendly, and it allows us to
properly validate and parse data.

Having a clear separate input text field Author:  is much more
user friendly {{#fileauthor:}}, which is so to say, a type of obscure
MediaWiki jargon. I know that we could probably hide it behind a
template, but that is still not as friendly as a separate field. I
keep on hearing that especially for newbies, a big blob of wikitext is
plain scary. We regulars may be able to quickly parse the structure in
 {{Information}}, but for newbies this is certainly not so clear.
We actually see that from the community there is a demand for
separating the meta data from the wikitext -- this is after all why
they implemented the uselang= hacked upload form with a separate text
box for every meta field.

Also, a separate field allows MediaWiki to understand what a certain
input really means. {{#fileauthor:[[User:Bryan]]}} means nothing to
MediaWiki or re-users, but Author: Bryan___ [checkbox] This is a
Commons username can be parsed by MediaWiki to mean something. It
also allows us to mass change for example the author. If I want to
change my attribution from Bryan to Bryan Tong Minh, I would need
to edit the wikitext of every single upload, whereas in the new system
I go to Special:AuthorManager and change the attribution.

 Similar to categories, and all otheruser edited metadata.
Categories is a good example of why metadata does not belong in the
wikitext. If you have ever tried renaming a category... you need to
edit every page in the category and rename it in the wikitext. Commons
is running multiple bots to handle category rename requests.

All these advantage outweigh the pain of migration (which could
presumably be handled by bots) in my opinion.


Best regards,
Bryan

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] File licensing information support

2011-01-22 Thread Platonides
An internally handled parser function doesn't conflict with showing it
as a textbox.

We could for instance store it as a hidden page prefix.

Data stored in the text blob:
Author: [[Author:Bryan]]
License: GPL
---
{{Information| This is a nice picture I took }}
{{Deletion request|Copyvio from http://www.example.org}}


Data shown when clicking edit:

Author: input type=text value=Bryan /
License: selectGPL/select

textarea name=textbox1
{{Information| This is a nice picture I took }}
{{Deletion request|Copyvio from http://www.example.org}}
/textarea

Why do I like such approach?
* You don't need to create a new way for storing the history of such
metadata.
* Old versions are equally viewable.
* Things like edit conflicts are already handled.
* Diffing could be done directly with the blobs.
* Import/export automatically works.
* Extendable for more metadata.
* Readable for tools/wikis unaware of the new format.

On the other hand:
* It breaks the concept of everything is in the source.
* Parsing is different based on the namespace. A naive parsing as
License: GPL instead of showing an image and a GPL excerpt, would be
acceptable, but if incomplete markup is stored there, the renderings
would be completely different. Could be skipped if placing the metadata
inside a tag. But what happens if the tag is inserted elsewhere in the
page? MediaWiki doesn't have run-once tags.


PS: The field author would be just a pointer to the author page, so you
wouldn't need to edit everything on any case.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] File licensing information support

2011-01-22 Thread Krinkle
On Jan 22, 2011 at 21:04 Platonides wrote:
 An internally handled parser function doesn't conflict with showing it
 as a textbox.

 We could for instance store it as a hidden page prefix.

 Data stored in the text blob:
 Author: [[Author:Bryan]]
 License: GPL
 ---
 {{Information| This is a nice picture I took }}
 {{Deletion request|Copyvio from http://www.example.org}}
 

 Data shown when clicking edit:

 Author: input type=text value=Bryan /
 License: selectGPL/select

 textarea name=textbox1
 {{Information| This is a nice picture I took }}
 {{Deletion request|Copyvio from http://www.example.org}}
 /textarea

So PHP would extract {{#author:4}} and {{#license:12}} from the  
textblob when showing the editpage.
And show the remaining wikitext in the textarea and the author/ 
license as seperate form elements.
And upon saving, generate {{#author:4}} {{#license:12}}\n again and  
prepend to the textblob.

Double instances of these would be ignored (ie. stripped automatically  
since they're not re-inserted to
the textblob upon saving).
One small downside would be that if someone would edit the textarea  
manually to do stuff with
author and license, the next edit would re-arrange them since they're  
extracted and re-insterted
thus showing messy diffs. (not a major point as long as it's done  
independant from JavaScript,
which it can be if done from core / php).

If that's what you meant, I think it is an interesting concept that  
should not be ignored, however personally
I am not yet convinced this is the way to go. But when looking at the  
complete picture of up/down sides,
this could be something to consider.

--
Krinkle

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] File licensing information support

2011-01-22 Thread Platonides
Krinkle wrote:
 So PHP would extract {{#author:4}} and {{#license:12}} from the  
 textblob when showing the editpage.
 And show the remaining wikitext in the textarea and the author/ 
 license as seperate form elements.
 And upon saving, generate {{#author:4}} {{#license:12}}\n again and  
 prepend to the textblob.
 
 Double instances of these would be ignored (ie. stripped automatically  
 since they're not re-inserted to
 the textblob upon saving).
 One small downside would be that if someone would edit the textarea  
 manually to do stuff with
 author and license, the next edit would re-arrange them since they're  
 extracted and re-insterted
 thus showing messy diffs. (not a major point as long as it's done  
 independant from JavaScript,
 which it can be if done from core / php).
 
 If that's what you meant, I think it is an interesting concept that  
 should not be ignored, however personally
 I am not yet convinced this is the way to go. But when looking at the  
 complete picture of up/down sides,
 this could be something to consider.
 
 --
 Krinkle

That's an alternative approach. I was thinking in accepting them only at
the beginning of the page, but extracting from everywhere is also an
alternative.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] File licensing information support

2011-01-21 Thread Platonides
Roan Kattouw wrote:
 2011/1/21 Platonides platoni...@gmail.com:
 Conceptually, revision table shouldn't link to file_props. file_props
 should be linked with image instead.

 Maybe, but the current image/oldimage schema resembling cur/old is
 horrible. For instance, there is no way to uniquely identify an
 oldimage row.
I agree. It should also be fixed.


 We talked about this for an hour and decided that we
 have some ideas for restructuring that, but that it's a huge operation
 that shouldn't block the license integration project.
 
 Roan Kattouw (Catrope)

If we wanted to map it to a page/revision format, it seems quite
straightforward. I'm missing something, right?


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] File licensing information support

2011-01-21 Thread Alex Brollo
The interest of wikisource project for a formal and standardyzed set of book
metadata (I presume from Dublin Core) into a database table  is obviuos.
Some preliminary tests into it.source suggest that templates and Labeled
Section Transclusion extension could have a role as existing wikitext
conteiners for semantized variables; the latter perhaps more interesting
than the former one, since their content can be accessed directly from any
page

I'd like that book metadata would be considered from the beginning of this
interesting project.

Alex
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] File licensing information support

2011-01-21 Thread Michael Dale
On 01/21/2011 02:45 AM, Alex Brollo wrote:
 The interest of wikisource project for a formal and standardyzed set of book
 metadata (I presume from Dublin Core) into a database table  is obviuos.
 Some preliminary tests into it.source suggest that templates and Labeled
 Section Transclusion extension could have a role as existing wikitext
 conteiners for semantized variables; the latter perhaps more interesting
 than the former one, since their content can be accessed directly from any
 page

 I'd like that book metadata would be considered from the beginning of this
 interesting project.

 Alex

This quickly dove tails into Semantic MediaWiki discussion... which
there are other threads on this list to reference.  There is a wiki data
summit / meeting coming up, where these issues will likely be discussed.
Maybe we could start eliciting requirements and needs of projects like
what you describe for wikisource and others that have been listed
elsewhere on a pre-meeting project page, this way we can be sure to hit
on all these items during the meeting.

--michael

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] File licensing information support

2011-01-21 Thread bawolff
 Hello,


 As you may have noticed, Roan, Krinkle and me have started to more
 tightly integrate image licensing within MediaWiki. Our aim is to
 create a system where it should be easy to obtain the basic copyright
 information of an image in a machine readable format, as well as
 querying images with a certain copyright state (all images copyrighted
 by User:XY, all images licensed CC-BY-SA, etc)

 At this moment we only intend to store author and license information,
 but nothing stops us from expanding this in the future.

 We have put some information in a not so structured way at mw.org [1].
 There are some issues open on the talk page [2]. Input is of course
 welcome, both here or preferably at the talk page.


 Bryan


 [1]  http://www.mediawiki.org/wiki/Files_and_licenses_concept
 [2]  http://www.mediawiki.org/wiki/Talk:Files_and_licenses_concept



Has there been consideration given to translating author names into
different languages?

Relative to other types of metadata, having the author in different
languages is not as important, since most
people just use whatever the name in the author's native language is
(or at least, that is what experience suggests to me). However, we
might want to have different translations of
the authors names in some circumstances:
*If the author is 'Unknown' or 'Anonymous', we'd definitely want to be
able to translate that.
*If the author is a company, government or a group with a proper name,
people tend to translate the name.
*If the author's native language is in a different script then the
current language, then the author's name is usually translated in my
experience. (Since to the average English viewer, a name in a language
like Arabic or Tamil that doesn't use the Latin alphabet, generally
look like any other name in that language I imagine people who only
speak Arabic would have trouble differentiating between the written
form of different English names).
(Of course, the above is just a guess for one you'd want to translate
author names, I don't know what happens in actual practise).
So I do think allowing such author properties to have multiple
translations is something to consider.

If there was support for translations of the values of these
properties, ideally when querying this information from the api - we'd
want to be able to do things like get the author's name in language X,
falling back to the original language if unavailable. Get the author's
name in all available languages, etc.


-bawolff

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] File licensing information support

2011-01-21 Thread Roan Kattouw
2011/1/21 Platonides platoni...@gmail.com:
 If we wanted to map it to a page/revision format, it seems quite
 straightforward. I'm missing something, right?

You're missing that migrating a live site (esp. Commons, with 8
million image rows and ~750k oldimage rows) from the old to the new
schema would be a nightmare, and would probably involve setting stuff
to read-only for a few hours.

Roan Kattouw (Catrope)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] File licensing information support

2011-01-21 Thread Brion Vibber
On Fri, Jan 21, 2011 at 10:43 AM, Roan Kattouw roan.katt...@gmail.comwrote:

 2011/1/21 Platonides platoni...@gmail.com:
  If we wanted to map it to a page/revision format, it seems quite
  straightforward. I'm missing something, right?
 
 You're missing that migrating a live site (esp. Commons, with 8
 million image rows and ~750k oldimage rows) from the old to the new
 schema would be a nightmare, and would probably involve setting stuff
 to read-only for a few hours.


If one's clever about it, this could probably actually be done on-the-fly in
a reasonably non-evil fashion.

Image version data isn't used as widely as revisions; eg things like
Special:Contributions always needed direct access to old revs looked up by
author, whereas I think image old versions are pretty much only pulled up by
title, via the image record. There are also relatively few revisions per
file -- old images usually only have a few revisions, and cases of thousands
of versions are I suspect very rare -- which would make the actual
conversion work relatively lightweight for each file record.

Further optimizing by delaying on-demand migration of a record until write
time could also keep it from being a sudden database  i/o sink. If indirect
lookups won't be needed, we can just keep reading the existing
image/oldimage records until they need to be updated on modification (or get
updated by a background task at leisure).

-- brion
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] File licensing information support

2011-01-21 Thread Platonides
Roan Kattouw wrote:
 2011/1/21 Platonides platoni...@gmail.com:
 If we wanted to map it to a page/revision format, it seems quite
 straightforward. I'm missing something, right?

 You're missing that migrating a live site (esp. Commons, with 8
 million image rows and ~750k oldimage rows) from the old to the new
 schema would be a nightmare, and would probably involve setting stuff
 to read-only for a few hours.
 
 Roan Kattouw (Catrope)

Do we agree in the target db schema?
That's the important point.

Migrating a large site like commons is 'just' an operations issue.
Making it readonly a bit wouldn't be a big issue, but could also for
instance move to an intermediate point, where uploads are stored in both
formats, while read only in the old one, while a script is moving
records. Finally, flip the switch and drop the old tables.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] File licensing information support

2011-01-21 Thread Roan Kattouw
2011/1/21 Platonides platoni...@gmail.com:
 Do we agree in the target db schema?
 That's the important point.

We haven't thought about it in detail. But it would be a fairly large
change and require changes throughout the software, as well as
possibly elsewhere in the schema.

 Migrating a large site like commons is 'just' an operations issue.
 Making it readonly a bit wouldn't be a big issue, but could also for
 instance move to an intermediate point, where uploads are stored in both
 formats, while read only in the old one, while a script is moving
 records. Finally, flip the switch and drop the old tables.

Sure, it can be dealt with. It's just that it'd be an epic upgrade :)

Roan Kattouw (Catrope)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] File licensing information support

2011-01-21 Thread Platonides
Roan Kattouw wrote:
 2011/1/21 Platonides platoni...@gmail.com:
 Do we agree in the target db schema?
 That's the important point.

 We haven't thought about it in detail. But it would be a fairly large
 change and require changes throughout the software, as well as
 possibly elsewhere in the schema.
 
 Migrating a large site like commons is 'just' an operations issue.
 Making it readonly a bit wouldn't be a big issue, but could also for
 instance move to an intermediate point, where uploads are stored in both
 formats, while read only in the old one, while a script is moving
 records. Finally, flip the switch and drop the old tables.

 Sure, it can be dealt with. It's just that it'd be an epic upgrade :)
 
 Roan Kattouw (Catrope)

We already have 1.17 branched, so... who dares to create a branch and
begin with it? :)



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] File licensing information support

2011-01-20 Thread Bryan Tong Minh
Hello,


As you may have noticed, Roan, Krinkle and me have started to more
tightly integrate image licensing within MediaWiki. Our aim is to
create a system where it should be easy to obtain the basic copyright
information of an image in a machine readable format, as well as
querying images with a certain copyright state (all images copyrighted
by User:XY, all images licensed CC-BY-SA, etc)

At this moment we only intend to store author and license information,
but nothing stops us from expanding this in the future.

We have put some information in a not so structured way at mw.org [1].
There are some issues open on the talk page [2]. Input is of course
welcome, both here or preferably at the talk page.


Bryan


[1]  http://www.mediawiki.org/wiki/Files_and_licenses_concept
[2]  http://www.mediawiki.org/wiki/Talk:Files_and_licenses_concept

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] File licensing information support

2011-01-20 Thread Platonides
Bryan Tong Minh wrote:
 Hello,
 
 As you may have noticed, Roan, Krinkle and me have started to more
 tightly integrate image licensing within MediaWiki. Our aim is to
 create a system where it should be easy to obtain the basic copyright
 information of an image in a machine readable format, as well as
 querying images with a certain copyright state (all images copyrighted
 by User:XY, all images licensed CC-BY-SA, etc)
 
 At this moment we only intend to store author and license information,
 but nothing stops us from expanding this in the future.
 
 We have put some information in a not so structured way at mw.org [1].
 There are some issues open on the talk page [2]. Input is of course
 welcome, both here or preferably at the talk page.
 
 
 Bryan
 
 
 [1]  http://www.mediawiki.org/wiki/Files_and_licenses_concept
 [2]  http://www.mediawiki.org/wiki/Talk:Files_and_licenses_concept

I would have probably gone by the page_props route, passing the metadata
from the wikitext to the tables via a parser function.

Conceptually, revision table shouldn't link to file_props. file_props
should be linked with image instead.

I like the idea of an author manager, specially if it's done as a
pseudo-namespace.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] File licensing information support

2011-01-20 Thread Michael Dale
On 01/20/2011 05:00 PM, Platonides wrote:
 I would have probably gone by the page_props route, passing the metadata
 from the wikitext to the tables via a parser function.

I would also say its probably best to pass metadata from the wikitext to
the tables via a parser function.  Similar to categories, and all other
user edited metadata. This has the disadvantage that its not easy 'as
easy' to edit via structured api entry point,  but has the advantage of
working well with all the existing tools, templates and versioning. 

--michael

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l