On Thu, Aug 27, 2009 at 11:35 AM, Gregory Maxwell<gmaxw...@gmail.com> wrote:
> On Wed, Aug 26, 2009 at 9:30 PM, John Vandenberg<jay...@gmail.com> wrote:
>> And yet ... this is what every successful wiki does.  Wikipedia is
>> extremely structured.  The writers are not always expected to know the
>> structure; gnomes do the tidying up.
>
> You must have an enormously different idea of extremely structured
> than I do. I once created software to extract lat/long from Wikitext
> on enwp and gave up when I got to the 100th or so distinct template
> invocation which did almost but not quite exactly the same thing.
>
> Go search the archives for some of my example bat-shit category linkage maps.
>
> It's extremely structures compared to complete anarchy, or perhaps
> "extremely structured" compared to the human body. It's not structured
> compared to normal sources of data. Not at all.

English Wikipedia is not "well" structured for many data mining tasks.
 The problem domain is much larger and the content more dynamic, but
there are also too many cooks and partially implemented ideas, and not
enough concern about consistency and re-use.

The Creator & Author namespace on Commons & Wikisource respectively
are a better example of structured information that can be mined.

Wikispecies pages have a limited amount of information on them, and it
is quite sensibly structured.  And I'd bet that the Wikispecies
community is also going to be more accommodating of any proposals to
increase standardisation of the content in order to allow mining.

--
John Vandenberg

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Reply via email to