Stian Soiland-Reyes created JENA-827:
----------------------------------------
Summary: Include all ISO 639-3 languages
Key: JENA-827
URL: https://issues.apache.org/jira/browse/JENA-827
Project: Apache Jena
Issue Type: Improvement
Components: RDF/XML
Affects Versions: Jena 2.12.1
Reporter: Stian Soiland-Reyes
Priority: Minor
{code}
WARN 2014-12-05 14:21:24,085
(com.hp.hpl.jena.rdf.model.impl.RDFDefaultErrorHandler:47) -
http://www.w3.org/ns/oa#(line 42 column 36):
{W116}
ISO-639 does not define language: 'vls'.
{code}
http://www.w3.org/ns/oa.rdf says
{code}
<dc:creator xml:lang="vls">Herbert Van de Sompel</dc:creator>
{code}
but it does.. http://www-01.sil.org/iso639-3/documentation.asp?id=vls
The complete list of ISO639-3 is not included in
https://github.com/apache/jena/blob/master/jena-core/src/main/java/com/hp/hpl/jena/rdfxml/xmlinput/lang/Iso639.java
- only ISO639-2 and ISO639-3.
The new lists can be found at http://www-01.sil.org/iso639-3/download.asp -
e.g. http://www-01.sil.org/iso639-3/iso-639-3.tab (UTF-8 although browser
disagrees).
I can work on the script to update this. One question is if Iso639.java needs a
new field for the identifier for all those languages which are not in -1 and -2
(e.g. "vls"). Another is if we should include the proper UTF-8 names of the
languages to get the accents correct, e.g.
{quote}
bbj I L Ghomálá'
{quote}
I'm not sure if the permissions are compatible with Apache license:
{quote}
ISO 639-3 Code Tables Terms of Use
The ISO 639-3 code set may be downloaded and incorporated into software
products, web-based systems, digital devices, etc., either commercial or
non-commercial, provided that:
attribution is given www.sil.org/iso639-3/ as the source of the codes;
the identifiers of the code set are not modified or extended except as may
be privately agreed using the Private Use Area (range qaa to qtz), and then
such extensions shall not be distributed publicly;
the product, system, or device does not provide a means to redistribute the
code set.
{quote}
the last bit might mean we should not include the *.tab files directly - but
would the listing in Iso6539.java consitute a "means to redistribute the code
set"?
Is "the identifiers of the code set are not modified" compatible with Apache
License which presumably allows you to modify anything?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)