To: Community@Apache.org

On the Jakarta General list, we've been discussing the possibility of 
introducing an "Internationalization" project into incubation.  It seems the 
consensus is that it should be targeted for a top-level 
programming-language-independent and spoken-language-independent Apache 
project, rather a Jakarta subproject.

(To anyone on the JG list: I used a blind CC so that this is the only message 
on Community@Apache.org which should be CCd to JG.  You can set up message 
filters on "[i18n]" on both lists to follow the discussions in either place....)

A preliminary organization of the project based on the JG discussions is 
included in my message below.

I don't mind "spearheading" the incubation myself.  Is there anyone else 
interested whom we can add to the list of contributors (see A through F below)? 
 Is there anything else we should consider before requesting entry into 
incubation?

TIA.
Robert Simpson

-------- Original Message --------
Subject: Re: [i18n] Internationalization subproject sponsor?
Date: Sun, 13 Jul 2003 21:32:36 +0100
From: robert burrell donkin <[EMAIL PROTECTED]>
Reply-To: "Jakarta General List" <general@jakarta.apache.org>
To: "Jakarta General List" <general@jakarta.apache.org>

On Monday, July 7, 2003, at 01:14 PM, Robert Simpson wrote:

<snip>

> I am surprised there isn't more interest in a common internationalization 
> framework within Jakarta.  But then I have been assuming that there are 
> non-English-speaking "members" in Jakarta, not just "committers" and 
> other users of the code.

i think that there several jakarta members who are not native english 
speakers. as Tetsuya Kitahata pointed out there are far fewer members than 
committers and i'm not sure whether there are any jakarta members who are 
native speakers of non-latin languages. it takes a lot of energy to 
spearhead an incubation and it's a big commitment for a member to make.

but i don't think that the member would have to come from jakarta (even if 
that's where those people involved with the product hope that it will end 
up). i wonder whether you might have more luck finding a sponsor over in 
xml-land. since many of their products are multi-language a common i18n 
framework may be of more pressing importance than here. i also have an 
idea that there are members whose native languages are non-latin.

i like the idea of an apache wide i18n project along the lines suggested 
by Tetsuya Kitahata.

- robert

-------- Original Message --------
Subject: Re: [i18n] Internationalization subproject
Date: Sat, 12 Jul 2003 08:55:00 -0400
Reply-To: "Jakarta General List" <general@jakarta.apache.org>,[EMAIL PROTECTED]
To: Jakarta General List <general@jakarta.apache.org>
References: <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> <[EMAIL 
PROTECTED]> <[EMAIL PROTECTED]> <[EMAIL PROTECTED]>

WRT Santiago's point about keeping the different translations in sync, the 
solution is to have each word/phrase in (1) or each section in (2) identified 
in the XML with a version number.  Then it would be a simple matter to have a 
program compare the two documents, and indicate where the translation needs to 
be updated (the program could even provide an initial translation of the 
section via machine translation, to be refined by the human translator).  The 
XML should also indicate who made each change and whether a change was prompted 
by a need to change the document (additions to content, for example) or as a 
translation of another version.  That way, no particular translation would have 
to be the "primary" document, and any conflicts could be identified and 
handled.  For example, a Spanish-speaking person could add a missing section to 
the Spanish translation of a document, and that section could then be 
translated back into the original and other translations.  This arrangement 
could also handle "proposed" additions (the XML equivalent of "I, a Spanish 
translator, propose to add a new section here"), which could be commented on 
(ex: "that section would be better placed over there") and/or voted on by 
translators of other languages, etc....

Am I getting the feeling right that the Internationalization project would be 
ultimately targeted for a top level, multiple-programming-language Apache 
project?  If so, I think the best approach would be to get the Java support 
done first, to demonstrate its viability and usefulness.  But still, from the 
start, the intent should be to design with language-independence as the 
ultimate goal.

So, in summary, the organization of the project would be:

1. code common to both (1) and (2)
1.1 code
    This would include any code that supports both (2) and (3), such as the 
code to do comparisons between translations
1.1.1 any programming-language-neutral stuff (configuration files, XML, etc)
1.1.2 Java
1.1.2.1 source code
1.1.2.1.1 source code contributors (committers)
1.1.3+ other programming languages, similarly

2. user interface internationalization (words and phrases)
2.1 code
    This would include the code to generate programming-language-specific 
resources, and provide access to those resources
2.1.1 any programming-language-neutral stuff (configuration files, XML, etc)
2.1.2 Java
2.1.2.1 source code
2.1.2.1.1 source code contributors (committers)
2.1.2.2 resources (translations, generated from XML)
2.1.3+ other programming languages, similarly
2.1.3+.1 source code for other programming languages
2.1.3+.2 resources for other programming languages (translations, generated 
from XML)
2.2 language translations (programming-language-neutral)
2.2.1 any spoken-language-neutral stuff (all-language distribution files, JUnit 
tests for file verification, etc)
2.2.2 English language translations (initial "source" translations)
2.2.2.1 XML format
2.2.2.1.1 English language translators (committers)
2.2.2.2 English user references
2.2.2.2.1 XML formatted user reference (generated, XSL-FO?)
2.2.2.2.2 HTML formatted user reference (generated, possibly with a doclet)
2.2.2.2.3 PDF formatted user reference (generated, possibly from XML user 
reference using Apache XML-FOP)
2.2.3+ other spoken languages, similarly

3. internationalization of complete documents
3.1 code
    This would include code or tools (possibly making use of other Apache code) 
to generate specific document file formats
3.1.1 any programming-language-neutral stuff (configuration files, XML, etc)
3.1.2 Java
3.1.2.1 source code
3.1.2.1.1 source code contributors (committers)
3.1.3+ other programming languages, similarly
3.1.3+.1 source code for other programming languages
3.2 language translations (programming-language-neutral)
3.2.1 any spoken-language-neutral stuff (all-language distribution files, JUnit 
tests for file verification, etc)
3.2.2 English language translations (initial "source" translations)
3.2.2.1 XML format (based on XSL-FO?)
3.2.2.1.1 English language translators (committers)
3.2.2.2 HTML format (generated)
3.2.2.3 PDF format (generated, possibly using Apache XML-FOP)
3.2.2.4+ other document file formats (generated)
3.2.3+ other spoken languages, similarly

The main difference between sections (2) and (3) is that (2) is organized 
primarily by programming language, with the programming-language-specific 
resources as part of the first subsection (2.1) keeping the second section 
(2.2) programming-language-neutral, while (3) is organized primarily by spoken 
language, with the programming-language-independent file formats as part of the 
second subsection (3.2), keeping them separate from the 
programming-language-specific stuff in the first subsection (3.1).

I'd be willing to work on the common code and user interface code, and it looks 
like there is a good starting list for the language translators.  Is there 
anyone willing to drive the second part, the internationalization of complete 
documents?

I can also be update the proposal as indicated above, and then let it be 
reviewed/modified here, or in CVS somewhere.  In your replies to the mailing 
list, please indicate in which of the following ways you might be willing to 
contribute:

A) committer for code for internationalization of user interface and possibly 
common code
B) committer for code for internationalization of complete documents and 
possibly common code
C) language translation (either or both UI or documents)
D) sponsor entry of Java version of Internationalization subproject into Jakarta
E) incorporate internationalization into another Apache/Jakarta sub/project 
(please specify)
F) none of the above

Robert Simpson

Santiago Gala wrote:

> Robert Simpson escribió:
> > Santiago Gala,
> >
> > As far a document and resource translation, I'm not sure if you are
> > referring to machine translation, or human translation.  My focus has
> > been on human translation, mainly because machine translation is
> > still pretty far from perfect.  There could be APIs for interfaces to
> > various machine translation tools, such as Systransoft, but I think
> > that should be a later, secondary priority.  Even if there was
> > support for machine translation, I would prefer that it could be
> > augmented by human proofreading and revision.  So it's probably just
> > as easy to let the language translator use whatever machine
> > translation tool s/he prefers.
> >
>
> David Taylor has already anwered WRT code.
>
> I was thinking mostly about having a "pool" of people who can translate
> and are more or less "cross project". For instance, I can translate
> English to Spanish, and I'm a committer in Jetspeed, but I could also
> translate, say, parts of the tomcat documents that I'm reading, or some
> XML stuff I'm interested into. Or even docs for Apache modules.
>
> The good part is that it would help the whole community, both WRT
> translation efforts and WRT crosspollination, as these kind of people
> will "see" beyond their small project(s). Also, it oculd bring new kinds
> of developers (Today I heard in the radio, coming home, that 72% od
> people in Spain cannot speak *any* foreign language. We are a bad sample
> but in most of Europe, less than 50% people speaks English.)
>
> The problem is that I can't see clearly how to implement such a
> crosscutting service/project, in ways that would not be difficult to
> impossible to manage. Specially since we should keep source control on
> both the original doc and the translations in sync.
>
> Any ideas?
>
> Regards
> --
> Santiago Gala
> High Sierra Technology, S.L. (http://hisitech.com)
> http://memojo.com?page=SantiagoGalaBlog

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to