Am 01.04.2015 um 09:20 schrieb Valentine Charles: > -Cultural Heritage data have most of the time a description property where you > will find lot of relevant free text information. The structured property but > inside you will find mostly free- text. I couldn't find a similar property in > Wikidata but there is something similar in Dbpedia. Is it something you are > planning to introduce or have you made the decision to exclude any free-text > infromation from Wikidata for now.
Free-form text is not machine-readable. Coding semi-structured information is very common in archives etc, but makes the data very hard to export, transform, and query. Free text fields should be used only for things that are actually text, such as a state motto. I think the need to encode things in free-form fields arose mostly from overly rigid data schemas. If there's no dedicated field for something, just stuff the info into the text field. Such fields turn into kitchen sinks that contain a hodge podge of different kinds of information. With Wikidata, there should be no need for this, since you can just create and use any properties you might be missing. That does mean though that wile importing, you have to somehow extract the relevant information from the free text. That effort has to be done at some point, if the data is to become machine readable. > -While I was looking for painting in Wikidata I also noticed the absence of > information related to the size/dimension of the Artwork. The information is > most of the time present in Cultural Heritage data. Is it something Wikidata > is > interested in or has it been omitted intentionally? We don't support units of measurement yet, and without these, it's not really possible to give the dimension. We hope to finally change this over the next couple of months. > -Then the last question is about values in different languages for a given > property. How do you indicate the language in Wikidata? Are you using a > xml:lang > attribute or something similar? xml:lang would be used in the XML/RDF export (and lang in the HTML rendering). Internally, the language would be a string associated with the "language" key in a JSON structure. But neither fact is really relevant to the data model on an abstract level. Most properties (most data types) are language agnostic. Quantities, strings, time values, etc, do not have any notion of language. The only datatype for properties that supports a language code is "monolingual text" (a pair of language code + text). This data type is used sparingly, since usually, the need for internationalized naming and description is covered by the labels, descriptions, and aliases associated with a data item. Labels, descriptions, and aliases are not "properties" about which (sourced) statements would be made in the context of the data item. Instead, they are editorial attributes. They are fully internationalized, and intended to enable display, disambiguation, and search in as many languages as possible. For example, Q219831 has labels (and descriptions) in many languages: * nl: De Nachtwacht (schilderij van Rembrandt van Rijn) * de: Die Nachtwache (Gemälde von Rembrandt) * en: The Night Watch (painting by Rembrandt van Rijn) * ru: Ночной дозор (картина) So, when the painting is referenced elsewhere, a label (and description) can be shown in the user's language. Internationalized statements/properties are rarely needed. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. _______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l