Stian Soiland-Reyes created JENA-827:
----------------------------------------

             Summary: Include all ISO 639-3 languages
                 Key: JENA-827
                 URL: https://issues.apache.org/jira/browse/JENA-827
             Project: Apache Jena
          Issue Type: Improvement
          Components: RDF/XML
    Affects Versions: Jena 2.12.1
            Reporter: Stian Soiland-Reyes
            Priority: Minor


{code}
WARN 2014-12-05 14:21:24,085 
(com.hp.hpl.jena.rdf.model.impl.RDFDefaultErrorHandler:47) - 
http://www.w3.org/ns/oa#(line 42 column 36):
{W116}

ISO-639 does not define language: 'vls'.
{code}

http://www.w3.org/ns/oa.rdf says 
{code}
  <dc:creator xml:lang="vls">Herbert Van de Sompel</dc:creator>
{code}


but it does.. http://www-01.sil.org/iso639-3/documentation.asp?id=vls

The complete list of ISO639-3 is not included in 
https://github.com/apache/jena/blob/master/jena-core/src/main/java/com/hp/hpl/jena/rdfxml/xmlinput/lang/Iso639.java
 - only ISO639-2 and ISO639-3.

The new lists can be found at http://www-01.sil.org/iso639-3/download.asp - 
e.g. http://www-01.sil.org/iso639-3/iso-639-3.tab  (UTF-8 although browser 
disagrees).


I can work on the script to update this. One question is if Iso639.java needs a 
new field for the identifier for all those languages which are not in -1 and -2 
(e.g. "vls"). Another is if we should include the proper UTF-8 names of the 
languages to get the accents correct, e.g. 

{quote}
bbj       I L Ghomálá'

{quote}


I'm not sure if the permissions are compatible with Apache license:

{quote}
ISO 639-3 Code Tables Terms of Use

The ISO 639-3 code set may be downloaded and incorporated into software 
products, web-based systems, digital devices, etc., either commercial or 
non-commercial, provided that:

    attribution is given www.sil.org/iso639-3/ as the source of the codes;
    the identifiers of the code set are not modified or extended except as may 
be privately agreed using the Private Use Area (range qaa to qtz), and then 
such extensions shall not be distributed publicly;
    the product, system, or device does not provide a means to redistribute the 
code set.
{quote}

the last bit might mean we should not include the *.tab files directly - but 
would the listing in Iso6539.java consitute a "means to redistribute the code 
set"?

Is "the identifiers of the code set are not modified" compatible with Apache 
License which presumably allows you to modify anything?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to