Hi Monte!
inline:

> Deeply hard, in fact, because it's complicated not only by language
syntax and grammatical rules, but also by qualitative factors (readability,
meaning, context, relevance etc).
> This already complicated situation then becomes many orders of magnitude
more difficult because these qualitative factors can differ between
languages.

Again, I agree that this is not an easy problem. However, in the case of
language translations, automated descriptions have the potential of
simplifying things tremendously. The algorithm for the grammar and syntax
of a certain language needs to be written only once. And once it's written,
it can be applied to every Wikidata item, past and future. Sure, there
would likely be a different algorithm for each language, and maybe even
different algorithms for various taxa of Wikidata items.  But this kind of
solution simply feels more scalable, and I'm surprised that researching
methods of accomplishing this are of little interest.


> I predict this won't be any worse than what happened when we enabled
section editing.

But when we enabled section editing, did we do it with a prominent call to
action? I just feel a little hesitation about going full-on with something
like this, without having a baseline level of administrative feedback in
the apps (e.g. a notification for when a description is reverted, and the
reason for it).

To be clear, of course I'm totally on board for experimenting with allowing
users to contribute descriptions. Making bold moves is what makes our team
so great. My goal is simply to point out various other solutions that, to
me, make slightly more sense (and to welcome feedback on why they don't!).


> But reducing the first sentence in this way is deceptively complicated to
do programmatically, precisely because of the word "arguably" in the
preceding sentence - it's almost entirely a matter of qualitative
judgement. You have to know what a fish is to know what parts of the first
sentence are most important

That's almost convincing :) but still... why duplicate content when the
essential information is already there?
Maybe I didn't convey my idea of "markup" for extracting a description
properly. For example, the description for the [[Fish]] article can be
marked up as follows:

A fish is any member of a paraphyletic group of organisms that consist of
all *<description>**gill-bearing aquatic craniate animal**</description>*s
that lack limbs with digits.

The above markup would be done by a human editor, with the knowledge that
the text within the <description> tag will end up as the Wikidata
description.  I would wager that a similar scheme could be applied to any
number of articles. Let's try it for a few random articles:

[[Poland]]
Poland (Polish: Polska; pronounced [ˈpɔlska] ( listen)), officially the
Republic of Poland (Polish: Rzeczpospolita Polska; pronounced
[ʐɛt͡ʂpɔˈspɔʎit̪a ˈpɔlska] ( listen)), is a *<description>**country in
Central Europe**</description>* bordered by Germany to the west; the Czech
Republic and Slovakia to the south...

[[Schadenfreude]]
Schadenfreude (/ˈʃɑːdənfrɔɪdə/; German: [ˈʃaːdn̩ˌfʀɔɪ̯də] ( listen)) is
*<description>**pleasure derived from the misfortunes of others*
*</description>*.[1] This word is taken from German...

[[Ming dynasty]]
The Ming dynasty, also Empire of the Great Ming, was the *<description>**ruling
dynasty of China for 276 years (1368–1644)**</description>* following the
collapse of the Mongol-led Yuan dynasty...

[[Homomorphism]]
In abstract algebra, a homomorphism is a *<description>**structure-preserving
map between two algebraic structures**</description>* (such as groups,
rings, or vector spaces)...

^^ What would be the downside(s) of doing something like that?



On Sun, Mar 22, 2015 at 9:37 PM, Monte Hurd <[email protected]> wrote:

> My previous reply was partial and accidentally sent - here's my actual
> reply :)
>
>
>
>
> On Sun, Mar 22, 2015 at 1:53 PM, Dmitry Brant <[email protected]>
> wrote:
>
>> Hi Lydia,
>>
>> Indeed, there are many more Wikidata items than Wikipedia articles.
>> However, the users of our mobile apps only see Wikipedia articles in our
>> search results (at least for now), which means that they will only be able
>> to contribute descriptions to Wikidata items for which a Wikipedia article
>> exists.
>>
>>
>
>
> They are also used in "*Recent*" and  "*Nearby*" and Vibha wants them in 
> "*Saved
> Pages*" list as well.
>
>
>
>
>
>> No doubt, the description field is an important component of each
>> Wikidata entry.  But, when there is a corresponding Wikipedia article, why
>> not query it to provide an automatic description? This could be based on
>> the first sentence of the article, or a subset of the first sentence, or
>> some other kind of metadata within the article.
>>
>
>
>
>
> Why not query it to provide an automatic description? Because finding the
> best subset of the first sentence(s) isn't all there is to it.
>
> For example, take the enwiki "Fish" article.
>
> The first couple sentences are these:
>
> *A fish is any member of a paraphyletic group of organisms that consist of
> all gill-bearing aquatic craniate animals that lack limbs with digits.
> Included in this definition are the living hagfish, lampreys, and
> cartilaginous and bony fish, as well as various extinct related groups.*
>
>
>
> So if the we reduce the description to its first sentence we have:
>
> *A fish is any member of a paraphyletic group of organisms that consist of
> all gill-bearing aquatic craniate animals that lack limbs with digits. *
>
>
>
> Now, for the sake of argument, let's imagine the *bold* words below
> represent a best case scenario for a relevant subset of the first sentence:
>
> *A fish is* any member of *a* paraphyletic group of organisms that
> consist of all *gill-bearing aquatic* craniate *animal*s that lack limbs
> with digits.
>
>
>
> So, we have "*A fish is a gill-bearing aquatic animal*", or you could
> reduce it further to "a *gill-bearing aquatic animal*".
>
>
> But reducing the first sentence in this way is deceptively complicated to
> do programmatically, precisely because of the word "arguably" in the
> preceding sentence - it's almost entirely a matter of qualitative
> judgement. You have to know what a fish is to know what parts of the first
> sentence are most important and then you have to know how to contextually
> stitch these words together according to rules of the language's grammar
> and syntax so they "read" nicely (see the word "a" and the "s" on the end
> of "animal*s*").
>
> Basically, great descriptions require a native speaker of the language
> with some skill at summarizing. This is such a low bar for humans that
> almost anyone could contribute quality descriptions.
>
>
> But, If descriptions are not human editable, then we are stuck with the
> limitations of whatever heuristics are used to auto-generate the
> description.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>> The key is that the description would stay with the article, which would
>> eliminate the need for duplication and synchronization.
>>
>> So, in a sense, I would look at it the other way: descriptions within
>> Wikipedia articles would be useful for Wikidata entries.
>>
>> -Dmitry
>>
>> On Sun, Mar 22, 2015 at 4:17 PM, Lydia Pintscher <
>> [email protected]> wrote:
>>
>>> On Sun, Mar 22, 2015 at 9:10 PM, Dmitry Brant <[email protected]>
>>> wrote:
>>> > Hi Jane,
>>> >
>>> > Perhaps my comments came off as more pessimistic than I intended. Of
>>> course
>>> > I believe in the power of crowdsourcing, and I would never want to make
>>> > anyone feel like their contributions are being marginalized.
>>> >
>>> > I'll agree for now that the idea of "fully" automated descriptions
>>> leans
>>> > more towards science fiction than reality. :)
>>> >
>>> > However, my whole point has more to do with the apparent duplication of
>>> > content that seems to be happening between the first sentence of
>>> Wikipedia
>>> > articles and the corresponding Wikidata description.  There's something
>>> > about it that seems unnecessary.  If we can figure out a way to
>>> > automatically extract the description from the first sentence of the
>>> > article, it would simplify things in two ways:
>>> >
>>> > 1) People wouldn't need to edit Wikidata descriptions, and would
>>> instead
>>> > focus on improving the Wikipedia article.
>>> > 2) People who monitor changes made to articles would need to monitor
>>> only
>>> > the article, instead of the article plus its corresponding Wikidata
>>> > description.
>>>
>>> There are a lot more items on Wikidata than articles on Wikipedia. And
>>> not every language has a Wikipedia article for each item. Don't just
>>> look at descriptions on Wikidata as something useful for Wikipedia.
>>> They're much more than that.
>>>
>>>
>>> Cheers
>>> Lydia
>>>
>>> --
>>> Lydia Pintscher - http://about.me/lydia.pintscher
>>> Product Manager for Wikidata
>>>
>>> Wikimedia Deutschland e.V.
>>> Tempelhofer Ufer 23-24
>>> 10963 Berlin
>>> www.wikimedia.de
>>>
>>> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>>>
>>> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
>>> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
>>> Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
>>>
>>> _______________________________________________
>>> Mobile-l mailing list
>>> [email protected]
>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l
>>>
>>
>>
>> _______________________________________________
>> Mobile-l mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/mobile-l
>>
>>
>
_______________________________________________
Mobile-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mobile-l

Reply via email to