Re: [WikimediaMobile] Wikidata descriptions: ruminations

Monte Hurd Sun, 22 Mar 2015 20:37:08 -0700

An explicit "description" tag eliminates the heuristic problem, but it has 
other problems I think.


It is markup reliant which raises the contributory bar and complicates any 
description editing UX. 

That is, when a user taps the edit pencil to the right of the description, 
instead of showing just the description in a simple editable text box with a 
small prompt to "Enter a concise description of 'article title'", you'd have to 
show the first section wikitext and explain the description markup. 

It also conflates two concerns, that of a concise description and some 
sub-portion of the first section text. I can appreciate the desire to write 
descriptive information only once, but this comes at a cost - changes to 
improve the quality of the description would have to also be proofed to ensure 
the changes also work in the sub-portion context. 







> On Mar 22, 2015, at 7:28 PM, Dmitry Brant <[email protected]> wrote:
> 
> Hi Monte!
> inline:
> 
> > Deeply hard, in fact, because it's complicated not only by language syntax 
> > and grammatical rules, but also by qualitative factors (readability, 
> > meaning, context, relevance etc).
> > This already complicated situation then becomes many orders of magnitude 
> > more difficult because these qualitative factors can differ between 
> > languages.
> 
> Again, I agree that this is not an easy problem. However, in the case of 
> language translations, automated descriptions have the potential of 
> simplifying things tremendously. The algorithm for the grammar and syntax of 
> a certain language needs to be written only once. And once it's written, it 
> can be applied to every Wikidata item, past and future. Sure, there would 
> likely be a different algorithm for each language, and maybe even different 
> algorithms for various taxa of Wikidata items.  But this kind of solution 
> simply feels more scalable, and I'm surprised that researching methods of 
> accomplishing this are of little interest.
> 
> 
> > I predict this won't be any worse than what happened when we enabled 
> > section editing.
> 
> But when we enabled section editing, did we do it with a prominent call to 
> action? I just feel a little hesitation about going full-on with something 
> like this, without having a baseline level of administrative feedback in the 
> apps (e.g. a notification for when a description is reverted, and the reason 
> for it).
> 
> To be clear, of course I'm totally on board for experimenting with allowing 
> users to contribute descriptions. Making bold moves is what makes our team so 
> great. My goal is simply to point out various other solutions that, to me, 
> make slightly more sense (and to welcome feedback on why they don't!).
> 
> 
> > But reducing the first sentence in this way is deceptively complicated to 
> > do programmatically, precisely because of the word "arguably" in the 
> > preceding sentence - it's almost entirely a matter of qualitative 
> > judgement. You have to know what a fish is to know what parts of the first 
> > sentence are most important
> 
> That's almost convincing :) but still... why duplicate content when the 
> essential information is already there?
> Maybe I didn't convey my idea of "markup" for extracting a description 
> properly. For example, the description for the [[Fish]] article can be marked 
> up as follows:
> 
> A fish is any member of a paraphyletic group of organisms that consist of all 
> <description>gill-bearing aquatic craniate animal</description>s that lack 
> limbs with digits.
> 
> The above markup would be done by a human editor, with the knowledge that the 
> text within the <description> tag will end up as the Wikidata description.  I 
> would wager that a similar scheme could be applied to any number of articles. 
> Let's try it for a few random articles:
> 
> [[Poland]]
> Poland (Polish: Polska; pronounced [ˈpɔlska] ( listen)), officially the 
> Republic of Poland (Polish: Rzeczpospolita Polska; pronounced 
> [ʐɛt͡ʂpɔˈspɔʎit̪a ˈpɔlska] ( listen)), is a <description>country in Central 
> Europe</description> bordered by Germany to the west; the Czech Republic and 
> Slovakia to the south...
> 
> [[Schadenfreude]]
> Schadenfreude (/ˈʃɑːdənfrɔɪdə/; German: [ˈʃaːdn̩ˌfʀɔɪ̯də] ( listen)) is 
> <description>pleasure derived from the misfortunes of 
> others</description>.[1] This word is taken from German...
> 
> [[Ming dynasty]]
> The Ming dynasty, also Empire of the Great Ming, was the <description>ruling 
> dynasty of China for 276 years (1368–1644)</description> following the 
> collapse of the Mongol-led Yuan dynasty...
> 
> [[Homomorphism]]
> In abstract algebra, a homomorphism is a <description>structure-preserving 
> map between two algebraic structures</description> (such as groups, rings, or 
> vector spaces)...
> 
> ^^ What would be the downside(s) of doing something like that?
> 
> 
> 
>> On Sun, Mar 22, 2015 at 9:37 PM, Monte Hurd <[email protected]> wrote:
>> My previous reply was partial and accidentally sent - here's my actual reply 
>> :)
>> 
>> 
>> 
>> 
>>> On Sun, Mar 22, 2015 at 1:53 PM, Dmitry Brant <[email protected]> wrote:
>>> Hi Lydia,
>>> 
>>> Indeed, there are many more Wikidata items than Wikipedia articles. 
>>> However, the users of our mobile apps only see Wikipedia articles in our 
>>> search results (at least for now), which means that they will only be able 
>>> to contribute descriptions to Wikidata items for which a Wikipedia article 
>>> exists.
>> 
>> 
>> 
>> They are also used in "Recent" and  "Nearby" and Vibha wants them in "Saved 
>> Pages" list as well. 
>> 
>> 
>> 
>>  
>>> No doubt, the description field is an important component of each Wikidata 
>>> entry.  But, when there is a corresponding Wikipedia article, why not query 
>>> it to provide an automatic description? This could be based on the first 
>>> sentence of the article, or a subset of the first sentence, or some other 
>>> kind of metadata within the article.
>> 
>> 
>> 
>> 
>> Why not query it to provide an automatic description? Because finding the 
>> best subset of the first sentence(s) isn't all there is to it.
>> 
>> For example, take the enwiki "Fish" article.
>> 
>> The first couple sentences are these:
>> 
>> A fish is any member of a paraphyletic group of organisms that consist of 
>> all gill-bearing aquatic craniate animals that lack limbs with digits. 
>> Included in this definition are the living hagfish, lampreys, and 
>> cartilaginous and bony fish, as well as various extinct related groups.
>>  
>> 
>> 
>> So if the we reduce the description to its first sentence we have:
>> 
>> A fish is any member of a paraphyletic group of organisms that consist of 
>> all gill-bearing aquatic craniate animals that lack limbs with digits. 
>> 
>> 
>> 
>> Now, for the sake of argument, let's imagine the bold words below represent 
>> a best case scenario for a relevant subset of the first sentence:
>> 
>> A fish is any member of a paraphyletic group of organisms that consist of 
>> all gill-bearing aquatic craniate animals that lack limbs with digits. 
>> 
>> 
>> 
>> So, we have "A fish is a gill-bearing aquatic animal", or you could reduce 
>> it further to "a gill-bearing aquatic animal".
>> 
>> 
>> But reducing the first sentence in this way is deceptively complicated to do 
>> programmatically, precisely because of the word "arguably" in the preceding 
>> sentence - it's almost entirely a matter of qualitative judgement. You have 
>> to know what a fish is to know what parts of the first sentence are most 
>> important and then you have to know how to contextually stitch these words 
>> together according to rules of the language's grammar and syntax so they 
>> "read" nicely (see the word "a" and the "s" on the end of "animals"). 
>> 
>> Basically, great descriptions require a native speaker of the language with 
>> some skill at summarizing. This is such a low bar for humans that almost 
>> anyone could contribute quality descriptions.
>> 
>> 
>> But, If descriptions are not human editable, then we are stuck with the 
>> limitations of whatever heuristics are used to auto-generate the description.
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>>  
>>> The key is that the description would stay with the article, which would 
>>> eliminate the need for duplication and synchronization.
>>> 
>>> So, in a sense, I would look at it the other way: descriptions within 
>>> Wikipedia articles would be useful for Wikidata entries.
>>> 
>>> -Dmitry
>>> 
>>>> On Sun, Mar 22, 2015 at 4:17 PM, Lydia Pintscher 
>>>> <[email protected]> wrote:
>>>> On Sun, Mar 22, 2015 at 9:10 PM, Dmitry Brant <[email protected]> wrote:
>>>> > Hi Jane,
>>>> >
>>>> > Perhaps my comments came off as more pessimistic than I intended. Of 
>>>> > course
>>>> > I believe in the power of crowdsourcing, and I would never want to make
>>>> > anyone feel like their contributions are being marginalized.
>>>> >
>>>> > I'll agree for now that the idea of "fully" automated descriptions leans
>>>> > more towards science fiction than reality. :)
>>>> >
>>>> > However, my whole point has more to do with the apparent duplication of
>>>> > content that seems to be happening between the first sentence of 
>>>> > Wikipedia
>>>> > articles and the corresponding Wikidata description.  There's something
>>>> > about it that seems unnecessary.  If we can figure out a way to
>>>> > automatically extract the description from the first sentence of the
>>>> > article, it would simplify things in two ways:
>>>> >
>>>> > 1) People wouldn't need to edit Wikidata descriptions, and would instead
>>>> > focus on improving the Wikipedia article.
>>>> > 2) People who monitor changes made to articles would need to monitor only
>>>> > the article, instead of the article plus its corresponding Wikidata
>>>> > description.
>>>> 
>>>> There are a lot more items on Wikidata than articles on Wikipedia. And
>>>> not every language has a Wikipedia article for each item. Don't just
>>>> look at descriptions on Wikidata as something useful for Wikipedia.
>>>> They're much more than that.
>>>> 
>>>> 
>>>> Cheers
>>>> Lydia
>>>> 
>>>> --
>>>> Lydia Pintscher - http://about.me/lydia.pintscher
>>>> Product Manager for Wikidata
>>>> 
>>>> Wikimedia Deutschland e.V.
>>>> Tempelhofer Ufer 23-24
>>>> 10963 Berlin
>>>> www.wikimedia.de
>>>> 
>>>> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>>>> 
>>>> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
>>>> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
>>>> Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
>>>> 
>>>> _______________________________________________
>>>> Mobile-l mailing list
>>>> [email protected]
>>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l
>>> 
>>> 
>>> _______________________________________________
>>> Mobile-l mailing list
>>> [email protected]
>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l
>

_______________________________________________
Mobile-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mobile-l

Re: [WikimediaMobile] Wikidata descriptions: ruminations

Reply via email to