To: [EMAIL PROTECTED]

On the Jakarta General list, we've been discussing the possibility of introducing an 
"Internationalization" project into incubation.  It seems the consensus is that it 
should be targeted for a top-level programming-language-independent and 
spoken-language-independent Apache project, rather a Jakarta subproject.

(To anyone on the JG list: I used a blind CC so that this is the only message on 
[EMAIL PROTECTED] which should be CCd to JG.  You can set up message filters on 
"[i18n]" on both lists to follow the discussions in either place....)

A preliminary organization of the project based on the JG discussions is included in 
my message below.

I don't mind "spearheading" the incubation myself.  Is there anyone else interested 
whom we can add to the list of contributors (see A through F below)?  Is there 
anything else we should consider before requesting entry into incubation?

TIA.
Robert Simpson

-------- Original Message --------
Subject: Re: [i18n] Internationalization subproject sponsor?
Date: Sun, 13 Jul 2003 21:32:36 +0100
From: robert burrell donkin <[EMAIL PROTECTED]>
Reply-To: "Jakarta General List" <[EMAIL PROTECTED]>
To: "Jakarta General List" <[EMAIL PROTECTED]>

On Monday, July 7, 2003, at 01:14 PM, Robert Simpson wrote:

<snip>

> I am surprised there isn't more interest in a common internationalization 
> framework within Jakarta.  But then I have been assuming that there are 
> non-English-speaking "members" in Jakarta, not just "committers" and 
> other users of the code.

i think that there several jakarta members who are not native english 
speakers. as Tetsuya Kitahata pointed out there are far fewer members than 
committers and i'm not sure whether there are any jakarta members who are 
native speakers of non-latin languages. it takes a lot of energy to 
spearhead an incubation and it's a big commitment for a member to make.

but i don't think that the member would have to come from jakarta (even if 
that's where those people involved with the product hope that it will end 
up). i wonder whether you might have more luck finding a sponsor over in 
xml-land. since many of their products are multi-language a common i18n 
framework may be of more pressing importance than here. i also have an 
idea that there are members whose native languages are non-latin.

i like the idea of an apache wide i18n project along the lines suggested 
by Tetsuya Kitahata.

- robert

-------- Original Message --------
Subject: Re: [i18n] Internationalization subproject
Date: Sat, 12 Jul 2003 08:55:00 -0400
Reply-To: "Jakarta General List" <[EMAIL PROTECTED]>,[EMAIL PROTECTED]
To: Jakarta General List <[EMAIL PROTECTED]>
References: <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> <[EMAIL 
PROTECTED]> <[EMAIL PROTECTED]> <[EMAIL PROTECTED]>

WRT Santiago's point about keeping the different translations in sync, the solution is 
to have each word/phrase in (1) or each section in (2) identified in the XML with a 
version number.  Then it would be a simple matter to have a program compare the two 
documents, and indicate where the translation needs to be updated (the program could 
even provide an initial translation of the section via machine translation, to be 
refined by the human translator).  The XML should also indicate who made each change 
and whether a change was prompted by a need to change the document (additions to 
content, for example) or as a translation of another version.  That way, no particular 
translation would have to be the "primary" document, and any conflicts could be 
identified and handled.  For example, a Spanish-speaking person could add a missing 
section to the Spanish translation of a document, and that section could then be 
translated back into the original and other translations.  This arrangement could also 
handle "proposed" additions (the XML equivalent of "I, a Spanish translator, propose 
to add a new section here"), which could be commented on (ex: "that section would be 
better placed over there") and/or voted on by translators of other languages, etc....

Am I getting the feeling right that the Internationalization project would be 
ultimately targeted for a top level, multiple-programming-language Apache project?  If 
so, I think the best approach would be to get the Java support done first, to 
demonstrate its viability and usefulness.  But still, from the start, the intent 
should be to design with language-independence as the ultimate goal.

So, in summary, the organization of the project would be:

1. code common to both (1) and (2)
1.1 code
    This would include any code that supports both (2) and (3), such as the code to do 
comparisons between translations
1.1.1 any programming-language-neutral stuff (configuration files, XML, etc)
1.1.2 Java
1.1.2.1 source code
1.1.2.1.1 source code contributors (committers)
1.1.3+ other programming languages, similarly

2. user interface internationalization (words and phrases)
2.1 code
    This would include the code to generate programming-language-specific resources, 
and provide access to those resources
2.1.1 any programming-language-neutral stuff (configuration files, XML, etc)
2.1.2 Java
2.1.2.1 source code
2.1.2.1.1 source code contributors (committers)
2.1.2.2 resources (translations, generated from XML)
2.1.3+ other programming languages, similarly
2.1.3+.1 source code for other programming languages
2.1.3+.2 resources for other programming languages (translations, generated from XML)
2.2 language translations (programming-language-neutral)
2.2.1 any spoken-language-neutral stuff (all-language distribution files, JUnit tests 
for file verification, etc)
2.2.2 English language translations (initial "source" translations)
2.2.2.1 XML format
2.2.2.1.1 English language translators (committers)
2.2.2.2 English user references
2.2.2.2.1 XML formatted user reference (generated, XSL-FO?)
2.2.2.2.2 HTML formatted user reference (generated, possibly with a doclet)
2.2.2.2.3 PDF formatted user reference (generated, possibly from XML user reference 
using Apache XML-FOP)
2.2.3+ other spoken languages, similarly

3. internationalization of complete documents
3.1 code
    This would include code or tools (possibly making use of other Apache code) to 
generate specific document file formats
3.1.1 any programming-language-neutral stuff (configuration files, XML, etc)
3.1.2 Java
3.1.2.1 source code
3.1.2.1.1 source code contributors (committers)
3.1.3+ other programming languages, similarly
3.1.3+.1 source code for other programming languages
3.2 language translations (programming-language-neutral)
3.2.1 any spoken-language-neutral stuff (all-language distribution files, JUnit tests 
for file verification, etc)
3.2.2 English language translations (initial "source" translations)
3.2.2.1 XML format (based on XSL-FO?)
3.2.2.1.1 English language translators (committers)
3.2.2.2 HTML format (generated)
3.2.2.3 PDF format (generated, possibly using Apache XML-FOP)
3.2.2.4+ other document file formats (generated)
3.2.3+ other spoken languages, similarly

The main difference between sections (2) and (3) is that (2) is organized primarily by 
programming language, with the programming-language-specific resources as part of the 
first subsection (2.1) keeping the second section (2.2) programming-language-neutral, 
while (3) is organized primarily by spoken language, with the 
programming-language-independent file formats as part of the second subsection (3.2), 
keeping them separate from the programming-language-specific stuff in the first 
subsection (3.1).

I'd be willing to work on the common code and user interface code, and it looks like 
there is a good starting list for the language translators.  Is there anyone willing 
to drive the second part, the internationalization of complete documents?

I can also be update the proposal as indicated above, and then let it be 
reviewed/modified here, or in CVS somewhere.  In your replies to the mailing list, 
please indicate in which of the following ways you might be willing to contribute:

A) committer for code for internationalization of user interface and possibly common 
code
B) committer for code for internationalization of complete documents and possibly 
common code
C) language translation (either or both UI or documents)
D) sponsor entry of Java version of Internationalization subproject into Jakarta
E) incorporate internationalization into another Apache/Jakarta sub/project (please 
specify)
F) none of the above

Robert Simpson

Santiago Gala wrote:

> Robert Simpson escribió:
> > Santiago Gala,
> >
> > As far a document and resource translation, I'm not sure if you are
> > referring to machine translation, or human translation.  My focus has
> > been on human translation, mainly because machine translation is
> > still pretty far from perfect.  There could be APIs for interfaces to
> > various machine translation tools, such as Systransoft, but I think
> > that should be a later, secondary priority.  Even if there was
> > support for machine translation, I would prefer that it could be
> > augmented by human proofreading and revision.  So it's probably just
> > as easy to let the language translator use whatever machine
> > translation tool s/he prefers.
> >
>
> David Taylor has already anwered WRT code.
>
> I was thinking mostly about having a "pool" of people who can translate
> and are more or less "cross project". For instance, I can translate
> English to Spanish, and I'm a committer in Jetspeed, but I could also
> translate, say, parts of the tomcat documents that I'm reading, or some
> XML stuff I'm interested into. Or even docs for Apache modules.
>
> The good part is that it would help the whole community, both WRT
> translation efforts and WRT crosspollination, as these kind of people
> will "see" beyond their small project(s). Also, it oculd bring new kinds
> of developers (Today I heard in the radio, coming home, that 72% od
> people in Spain cannot speak *any* foreign language. We are a bad sample
> but in most of Europe, less than 50% people speaks English.)
>
> The problem is that I can't see clearly how to implement such a
> crosscutting service/project, in ways that would not be difficult to
> impossible to manage. Specially since we should keep source control on
> both the original doc and the translations in sync.
>
> Any ideas?
>
> Regards
> --
> Santiago Gala
> High Sierra Technology, S.L. (http://hisitech.com)
> http://memojo.com?page=SantiagoGalaBlog

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to