Re: [Dspace-tech] Localization inside config files?
Hi Christian, I've tried to capture all these details/issues you've mentioned over on our wiki. I've added links to these new tools, and also added the mention of the hardcoded text in Mirage Theme. https://wiki.duraspace.org/display/DSPACE/i18n+Improvements+Proposal I've also entered an issue in our Issue Tracker system around the hardcoded buttons in the XMLUI (they seem to not only be in the Mirage Theme). https://jira.duraspace.org/browse/DS-1159 Feel free to add more details to that wiki page if you stumble across additional things. You are also welcome to add new tickets to our Issue Tracker as you stumble across more issues. - Tim On 4/21/2012 6:13 PM, Christian Völker wrote: Hallo, just came across this page on wikipedia: http://en.wikipedia.org/wiki/Computer_Assisted_Translation There is a list of tools (and even better technologies). Among other apps they mention http://omegat.org/en/omegat.html which is ment to support .properties files and is a java app by itself. Webbased systems are mentionend as well: http://www.globalsight.com/ http://sourceforge.net/projects/ote/ I did not try any of them… Bye, Christian -- For Developers, A Lot Can Happen In A Second. Boundary is the first to Know...and Tell You. Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! http://p.sf.net/sfu/Boundary-d2dvs2 ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Localization inside config files?
Hello, just found this: dspace/config/modules/curate.cfg needs to be stored with ISO-Latin-1 Encoding to display properly, as opposed to all other files which are stored as UTF-8 nowadays. You may try to localize ui.tasknames which requires the use of umlauts in german language to understand what I mean. You cant write the words as they should be displayed just like you do in messages.xml. The get displayed as two weird glyphs. You cant use UTF-8 hex encoding which gets displayed as is where it should be interpreted. I finally found storing it using the encoding mentioned above works. This is ohn Debian, Java 1.6, Tomcat 6 if that would be of any relevance. Should I mention that I think that it strenghents the case for streamlining the localization process? Bye, Christian -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Localization inside config files?
Hello, in the Mirage theme (or dri2xhtml-alt, have not verified so far), there are three pairs of Button labels Add and Remove Selected. I found them to be hard-coded in the file themes/Mirage/lib/xsl/core/forms.xsl You may find the lines searching for 'input type=submit value='. Bye, Christian -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Localization inside config files?
Hallo, just came across this page on wikipedia: http://en.wikipedia.org/wiki/Computer_Assisted_Translation There is a list of tools (and even better technologies). Among other apps they mention http://omegat.org/en/omegat.html which is ment to support .properties files and is a java app by itself. Webbased systems are mentionend as well: http://www.globalsight.com/ http://sourceforge.net/projects/ote/ I did not try any of them… Bye, Christian -- For Developers, A Lot Can Happen In A Second. Boundary is the first to Know...and Tell You. Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! http://p.sf.net/sfu/Boundary-d2dvs2 ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Localization inside config files?
Thanks for tracking that down. On Tue, Apr 10, 2012 at 11:03:49AM -0700, Mark Diggory wrote: [snip] I speculate heavily that reimplementing this 300 lines of code to support JAVA ResourceBundles (either properties based or XML based) rather than cocoons messages.xml format would get us the ability to use the same i18n messages tags in XMLUI on the messages.properties files in dspace-api and so-on. Sounds good. But, considering that there is interest in an enhanced-for-translation format and XML is s malleable, what if we went the other way: create an XMLResourceBundle that the stock Java can just use without touching existing code? Properties.loadFromXML() could do most of the work, if we define it conformably with the Properties DTD. That is: find or invent a namespace containing elements that translation support facilities can use for all the things that translators want to do but aren't relevant to the code that uses their work, and overlay it on the Cocoon format or something like it. If we do it right, we can give that to the translation community and not have to maintain it. Or probably there is already something we can readily transform to/from the forms we need. -- Mark H. Wood, Lead System Programmer mw...@iupui.edu Asking whether markets are efficient is like asking whether people are smart. pgpef6d9rGiuY.pgp Description: PGP signature -- Better than sec? Nothing is better than sec when it comes to monitoring Big Data applications. Try Boundary one-second resolution app monitoring today. Free. http://p.sf.net/sfu/Boundary-dev2dev___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Localization inside config files?
On Sat, Apr 07, 2012 at 12:17:56AM +0200, Christian Völker wrote: Hello, Am 05.04.2012 um 22:21 schrieb Tim Donohue: To this end, I've attempted to summarize what I've read in this thread into a new Proposal page on our wiki: https://wiki.duraspace.org/display/DSPACE/i18n+Improvements+Proposal The goal of this wiki page is *not* to take discussion away from this listserv. I just found this page and Id like to comment on. We dont need to discuss messages.properties und messages.xml files together. Both types of files can not be mapped to each other because of differences in content, not only differences in format. Thus, translating both user interfaces is a separate effort anyway. With ongoing modularization, they will depart from each other even more, so there is no chance to keep them synced somehow. In the times when there was only a jspui, translation did not seem that troublesome, so I would concentrate on the new situation with Cocconn .xml-files. Furthermore, it was said there a tools available for the .properties files, so one more reason, to deal with them for the moment. messages.properties has another problem: it's a mixture of texts used by JSPUI and texts used by dspace-api. I feel that we need to pull these apart. Some thoughts on the messages.xml format .ts files for which tools exist are XML files either. With the extremly simple structure of messages.xml catalogues it should not be that much of a deal to create e.g. a XSLT transformation for translating back and forth. The simple structure of message catalogues by itself seems limiting however. I read over this page to get an idea of whether xmlns:i18n=http://apache.org/cocoon/i18n/2.1 contains some more useful attributes or such to store more info but I could not identify. I think a translation file format should support an attribute untranslated or deprecated for each message string at least, with an translators comment being another desirable feature. I havent tried to feed Cocoon with a messages.xml file that was just extended with no regard to the schema definition, but it would be worth to create our own messages.xml format as long as we dont break existing functionality. I had not thought about the preferred source form for translation being enriched with attributes that are only used in the translation process. It would make sense to ship the catalogs in such a format, to support translators, and then convert it to the preferred form for runtime while packaging the applications. Maven has conventions for dealing with sources generated by the build process. Once the conversion tools have been identified, we just have to plumb them into the right build stage. We could even convert the same sources to different forms of generated-sources in different components if that seems useful. I feel that we should strongly resist any temptation to just create our own DSpace-specific format. If there is an existing one that we can agree on, we should adopt that. I feel that it is worth some effort to avoid adding to the cacophony of competing formats if we can. If we decide to augment messages.xml then the changes need to be in their own namespace (and that namespace might make a valuable contribution to the translation community beyond DSpace). Then Cocoon should not even see our additions, and anyway they'd be easy to transform out of what Cocoon is given. [snip] For emails, it would already be an improvement if the scheme with adding the language to the file name would be applicable. E.g. feedback and feedback_de. With this said, this might not work in all cases. Most emails are triggered by user interaction in the web interface. The proper locale could be derived from browser settings then. However, some notifications are sent just to make the user aware of new items or such and that he should visit the site again. In this case, the eperson record would need to store the preferred locale. I dont know whether this is the case already. There is already a language column. It is a VARCHAR(64) and probably ought to be recast as some short datum that can be used by the applications for making decisions, such as locale names (fr_CA and the like). I don't know how it's used at the moment. I am not sure whether it would be a good idea to turn emails into just a single message in the catalog file. First, which of them, the jspui .properties or xmlui messages.xml? Or a precedence rule? Like the version associated to the interface in use is the one that will be read? Now, not really. What, if one day, the jspui gets removed? There is another reason, not to put emails into the catalog. People might want to use formatted mail. I doubt that this will work easily within a message catalog and if only for the amount of text. Email templates belong to the dspace-api, not to any dspace-XXXui. This is another reason why we need to work out a clear
Re: [Dspace-tech] Localization inside config files?
Absolutely, likewise, we need to watch out we don't end up creating a Rube Goldberg machine as a dependency, just to make the translators be able to work with one giant file instead of 4 or 5. Theres three reasons there is not just one file 1.) So that i18n can be locally customized without needing to merge back the differences every-time you upgrade. 2.) We have different technologies for each webapp, email, controlled vocabularies/taxonomies, etc. 3.) Addon i18n (especially XMLUI Addons) are maintained separately for addon so that you do not need to merge files to upgrade. We have the following areas resources that conain some for of i18n. a. one file for api / jspui (ResourceBundle) b. one or more files for xmlui and addons (Cocoon Messages) d. individual separate files for emails (Custom DSpace wackiness) e. individual taxonomies and controlled vocabs (Custom DSpace wackiness) We need to address requirements for a common design guideline: 1.) A common i18n format would be beneficial 2.) Easily Overriding keys in that format via the Database would be beneficial 3.) Relying on a more common/ubiquitous technology would be beneficial Because of the following, my recommendation for a technical projects roadmap is (and these can be in stages), but a common design and agreement is essential from the beginning. So a/b/c are critical. a.) Implement a Standard JAVA ResourceBundle based i18nTransformer in Cocoon rather than the current format. b.) Decide if we want to use Properties or XML based resource bundles (or allow both) since Java 1.6 or greater is now required. c.) Write JDBC ResourceBundle implementation that can be chained such that above RB are used as defaults and key/values can be overridden. d.) Re-implement Email and taxonomies in the future to use a more ubiquitous templating language that provides support for using ResourceBundles e.) Write user interfaces in xmlui to support overriding i18n keys/values in database. I would look closely at how other projects that leverage Spring WebMVC (and/or other frameworks) approach i18n. I expect in most cases, they all use ResourceBundles. Finally, I will add that the current i18n projects, do not really provide benefit given our yearly release cycle. It would be better to consolidate these back into the DSpace/DSpace master and having i18n maintained in one of these services with additional Release Manager responsibilities for updating files during maintenance releases. I know this is contrary to my past position, my current goal in doing so is to reflect that though we want to addons, it would be better assuring addons carried their own i18n resources instead of multiplying the number of projects necessary to manage an addon and conversely the complexity of DSpace itself. Mark On Tue, Apr 3, 2012 at 1:51 AM, helix84 heli...@centrum.sk wrote: Pretty please, don't reinvent the wheel poorly, just give us translators .po files (preferably one). Then there are existing tools like Pootle and a bazillion others for things like online translation and reusing old translations. That's all I have to say, but I can't stress it enough. Regards, ~~helix84 On Tue, Apr 3, 2012 at 05:53, Mark Diggory mdigg...@atmire.com wrote: I agree that it shouldn't be exclusively in the UI layer. I do support the idea of, if not a centralized catalog where all values reside, then at least the ability to define a catalog that allows Repository Admins to easily override the existing values found in the files from the UI. In fact, IMO, it should really be in the database in a manner that, similar to configuration modularization would be separated up into contexts based on context. localization_catalog { context key value lang } Individual email templates (which are glorified parameterized messages), messages.xml and messages.properties all residing within one database table would shift the design of internationalization away from a developer activity. Simple tooling can be written to to dump/restore the database from properties files (xml or properties formats) as needed. Simple interfaces can be crafted in the admin area of the user interface to introduce simple editing of the field values and Caching can be employed in the XMLUI to optimize performance and reduce db queries for i18n tags. [image: @mire Inc.] *Mark Diggory (Schedule a Meeting http://bit.ly/xNePTl)* *2888 Loker Avenue East, Suite 305, Carlsbad, CA. 92010* *Esperantolaan 4, Heverlee 3001, Belgium* http://www.atmire.com On Monday, April 2, 2012 at 11:02 AM, Richard Rodgers wrote: I think Mark makes a number of good points here - esp. regarding modularity - and it's worth emphasizing that the net effect should be *less* localization effort, even if there are potentially more files, since one would only need to worry about the locally deployed modules - but I'm a bit puzzled about the 'single catalog scheme' as a desired future state. Without
Re: [Dspace-tech] Localization inside config files?
Hello, Am 05.04.2012 um 22:21 schrieb Tim Donohue: To this end, I've attempted to summarize what I've read in this thread into a new Proposal page on our wiki: https://wiki.duraspace.org/display/DSPACE/i18n+Improvements+Proposal The goal of this wiki page is *not* to take discussion away from this listserv. I just found this page and Id like to comment on. We dont need to discuss messages.properties und messages.xml files together. Both types of files can not be mapped to each other because of differences in content, not only differences in format. Thus, translating both user interfaces is a separate effort anyway. With ongoing modularization, they will depart from each other even more, so there is no chance to keep them synced somehow. In the times when there was only a jspui, translation did not seem that troublesome, so I would concentrate on the new situation with Cocconn .xml-files. Furthermore, it was said there a tools available for the .properties files, so one more reason, to deal with them for the moment. Some thoughts on the messages.xml format .ts files for which tools exist are XML files either. With the extremly simple structure of messages.xml catalogues it should not be that much of a deal to create e.g. a XSLT transformation for translating back and forth. The simple structure of message catalogues by itself seems limiting however. I read over this page to get an idea of whether xmlns:i18n=http://apache.org/cocoon/i18n/2.1 contains some more useful attributes or such to store more info but I could not identify. I think a translation file format should support an attribute untranslated or deprecated for each message string at least, with an translators comment being another desirable feature. I havent tried to feed Cocoon with a messages.xml file that was just extended with no regard to the schema definition, but it would be worth to create our own messages.xml format as long as we dont break existing functionality. It should be possible to compare two versions of a messages.xml file based on the key values only. Thus the english and a localized version could be compared to find out about the coverage of the localized version quickly. On the Mac, there is nothing similar to Altovas tools, so I know only of quite simple XML Editors and I prefer diff for now. The aforementioned way of working is not possible with diff. I am sure this is just a lack of knwoledge regarding existing tools on my side. So if anyone has suggestions on great XML Tools to check out, please tell. Scattering messages.xml files all over the file tree is not much of an issue. Inconsistency in using overlays is. Hugh, did I say that? Well, as soon as one knows, that there are several places where messages.xml files go, as long as they are named exactly the same, it is one simple search to find them all. However, using overlays does not seem to be handled consitently for all modules. For now, I have learned that the proper path to put them starts with dspace/config/modules and ends with src/main/webapp/i18n/ For xmlui, the missing middle part is just xmlui. Remember, as an user, I dont have to drop the file just into the right place, but I have to create the path. This requires better understanding. Now, eqipped with this knowledge, I try to derive the proper overlay path for modules from their source path: dspace-xmlui/dspace-xmlui-webapp/src/main/webapp /i18n dspace-xmlui/dspace-xmlui-api /src/main/resources/aspects/XMLWorkflow /i18n dspace-discovery/dspace-discovery-xmlui-api /src/main/resources/aspects/Discovery /i18n Well, dspace-xmlui/dspace-xmlui-webapp gets shortened to xmlui. Should dspace-xmlui/dspace-xmlui-api be shortened to the same xmlui, given that the first part of the path minus dspace- is the same? Or will there be a future name-clash? And for dspace-discovery/dspace-discovery-xmlui-api is it simply discovery then? Ok, then for the second part: It looks as if they cant all be thrown into src/main/resources/i18n, also not into src/main/resources/i18n/$modulename/. I could not even figure out whether overlays need to be supported actively by the programmer, reading files from a place agreed upon, or are they picked up by the maven process automagically? Finally, I just put them in the source path which means I compromise on the build process. Not that much of a deal for now, but you should be aware of the consequences of not explaining the mechanics from a users perspective. To sum up, a separate section Localization under Customization would be a wellcome addition as a one stop to gather this kind of information. Alter- natively, there should be a minimum documentation requirement for all new modules to mention this kind of stuff in a standardized place. Whose convenience counts most? In this thread, programmers versus localizers convenience was discussed. But localizers are still more engaged and
Re: [Dspace-tech] Localization inside config files?
Hi all, This i18n discussion has been bouncing around in my head for a while, so I felt it's time to chime in. First off, it is very important to keep this discussion rolling. Most, if not all, of the developers/committers will readily admit that DSpace i18n processes are currently a bit broken. The main issue here is not that we don't want to fix them. Rather, it's that we are a bit stuck in how to improve the processes for the translators while keeping development moving forward. So, these discussions are extremely helpful! To this end, I've attempted to summarize what I've read in this thread into a new Proposal page on our wiki: https://wiki.duraspace.org/display/DSPACE/i18n+Improvements+Proposal It's highly possible that I've misrepresented a major point (or missed something along the way). So, I'd recommend others read through this, enhance it, and add your own comments or suggestions. You can add brainstorms directly to the Wiki page (with your name after it), or express them on this list (and we can eventually 'capture it' on the wiki page later as needed) The goal of this wiki page is *not* to take discussion away from this listserv. Rather, it's to provide a summary of this discussion, so that we can begin brainstorming from that summary and look towards developing (or trying out) possible solutions to each of these frustrations. I've added some brainstorms of my own to the wiki page itself on many of these key points/frustrations that have been expressed by Christian helix84. I definitely don't claim to have the answers here though. These are merely brainstorms. All in all, I feel strongly that we need to find ways to ease the job of translators, simplify the installation of i18n files, and avoid complicating development processes. This may be a delicate balancing act, but it is an important one. - Tim -- Better than sec? Nothing is better than sec when it comes to monitoring Big Data applications. Try Boundary one-second resolution app monitoring today. Free. http://p.sf.net/sfu/Boundary-dev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Localization inside config files?
Pretty please, don't reinvent the wheel poorly, just give us translators .po files (preferably one). Then there are existing tools like Pootle and a bazillion others for things like online translation and reusing old translations. That's all I have to say, but I can't stress it enough. Regards, ~~helix84 On Tue, Apr 3, 2012 at 05:53, Mark Diggory mdigg...@atmire.com wrote: I agree that it shouldn't be exclusively in the UI layer. I do support the idea of, if not a centralized catalog where all values reside, then at least the ability to define a catalog that allows Repository Admins to easily override the existing values found in the files from the UI. In fact, IMO, it should really be in the database in a manner that, similar to configuration modularization would be separated up into contexts based on context. localization_catalog { context key value lang } Individual email templates (which are glorified parameterized messages), messages.xml and messages.properties all residing within one database table would shift the design of internationalization away from a developer activity. Simple tooling can be written to to dump/restore the database from properties files (xml or properties formats) as needed. Simple interfaces can be crafted in the admin area of the user interface to introduce simple editing of the field values and Caching can be employed in the XMLUI to optimize performance and reduce db queries for i18n tags. [image: @mire Inc.] *Mark Diggory (Schedule a Meeting http://bit.ly/xNePTl)* *2888 Loker Avenue East, Suite 305, Carlsbad, CA. 92010* *Esperantolaan 4, Heverlee 3001, Belgium* http://www.atmire.com On Monday, April 2, 2012 at 11:02 AM, Richard Rodgers wrote: I think Mark makes a number of good points here - esp. regarding modularity - and it's worth emphasizing that the net effect should be *less* localization effort, even if there are potentially more files, since one would only need to worry about the locally deployed modules - but I'm a bit puzzled about the 'single catalog scheme' as a desired future state. Without much thought, I can come up with 4-5 quite distinct sites (places, files, ways) where localization occurs in DSpace: * in email templates (config/email) * in dspace,cfg and many other config files (starting with the ' dspace.name' property) * in input_forms.xml * messages.xml and that ilk and I'm sure there are others; the curation stuff does not introduce a new locus of localization: localizability permeates the application already. It's also worth noting that localized strings occur not just in the UI proper - they can appear in RSS feeds, OAI-PMH harvests, etc So I'd be leery of a plan to shoehorn all localization into any single 'catalog scheme' , esp. one that is explicitly tied to a UI presentation layer. Having said all this, I sympathize with Christian's plight, and affirm with Mark that we can do a better job of managing it. Richard R. On Apr 2, 2012, at 9:21 AM, Mark H. Wood wrote: On Sat, Mar 31, 2012 at 02:05:34PM +0200, Christian Völker wrote: [snip] Now I just found a new flavour of localization in the dspace/config/modules/curate.cfg file: #ui.tasknames = \ # profileformats = Profile Bitstream Formats, \ # requiredmetadata = Check for Required Metadata, \ # checklinks = Check Links in Metadata ui.tasknames = \ profileformats = Dateityp angehängter Dateien untersuchen, \ requiredmetadata = Pflichtfelder auf Inhalt überprüfen, \ checklinks = Links in Metadaten überprüfen # general = General Purpose Tasks, general = Allgemeine Aufgaben, #ui.statusmessages = \ # -3 = Unknown Task, \ # -2 = No Status Set, \ # -1 = Error, \ # 0 = Success, \ # 1 = Fail, \ # 2 = Skip, \ # other = Invalid Status ui.statusmessages = \ -3 = Unbekannte Aufgabe, \ -2 = Kein Zustand definiert, \ -1 = Fehlerhaft, \ 0 = Erfolgreich, \ 1 = Fehlgeschlagen, \ 2 = Übersprungen, \ other = Ungültiger Zustand Honestly, is this the way to go? Clearly not. We already have two different message catalog schemes, which IMHO is one too many. Configurable message texts should at least be confined to those two. It would be good to get every component to use a single scheme. Bedides the monsterous messages.xml file in modules/xmlui/src/main/webapp/i18n/ with more than 2.100 meanwhile, we already have numerous other places now, where to keep messages.xml files updated, in places such as dspace-xmlui/dspace-xmlui-api/src/main/resources/aspects/XMLWorkflow/i18n or dspace-discovery/dspace-discovery-xmlui-api/src/main/resources/aspects/Discovery/i18n Message catalogs will proliferate, because DSpace is becoming modular. Each module needs its own catalog, because it might be released on a different schedule, and because separable components shouldn't depend on each others' catalogs. Indeed it might be good to break up the modules/xmlui/src/main/webapp/i18n/messages.xml into more
Re: [Dspace-tech] Localization inside config files?
On Tue, Apr 03, 2012 at 10:51:08AM +0200, helix84 wrote: Pretty please, don't reinvent the wheel poorly, just give us translators .po files (preferably one). Then there are existing tools like Pootle and a bazillion others for things like online translation and reusing old translations. Too bad we can't go back to the 1990s and explain this to the people who built java.util.ResourceBundle and the people who reinvented *that* wheel in XML for Cocoon. Otherwise we'd have to reinvent a lot of what they did. Reinvention seems to be the order of the day, in internationalization of code. Others have already reinvented this wheel pretty well, and bolted it firmly into the rest of their work. If you want to work in .po format, there are tools which can convert formats. Trying to recreate gettext in Java, making it work seamlessly with the *other* I18N/L10N bits of Java (DateFormat, for example), and then grafting it all onto Cocoon sounds like a bit much, given that message translation is not what DSpace is *for*. That effort, it seems to me, would be better spent in building any missing format conversions, which can be used by lots of people in addition to DSpace sites. I note that the site mentioned in the thread over on -devel handles Java property files *and* .po files (and a number of other formats). For that matter, so does Pootle. It should be fairly simple to work in the format you like best and translate to/from the format(s) that DSpace wants. Regardless of format it all comes down to bundles of mappings of one set of keys onto multiple sets of values. I don't think it's *possible* to have One Catalog to Rule Them All. Nobody controls all of the localizable code in DSpace. DSpace is supposed to be extensible by sites who see needs that others have overlooked. And a well-made extension that gets shared will probably need a message catalog. Of its own, because it isn't developed as part of The Core. I've written tiny add-on aspects to drop into XMLUI that had their own tiny catalogs, because those messages didn't *belong* in DSpace-proper; they belong to the add-on and need to be maintained with it. Besides, the main catalogs already seem unwieldy large. -- Mark H. Wood, Lead System Programmer mw...@iupui.edu Asking whether markets are efficient is like asking whether people are smart. pgpOLWt3MdYk9.pgp Description: PGP signature -- Better than sec? Nothing is better than sec when it comes to monitoring Big Data applications. Try Boundary one-second resolution app monitoring today. Free. http://p.sf.net/sfu/Boundary-dev2dev___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Localization inside config files?
Let me start over, constructively. I realize the limitations: * I know that Cocoon works with .xml and ResourceBundle with .properties internally * I know you're not going to rebuild Cocoon or any other underlying DSpace components * I know about the ongoing modularization and what it means for message catalogs Regardless of format it all comes down to bundles of mappings of one set of keys onto multiple sets of values. In principle, I disagree. Think of support for plural forms or ngettext. Those cannot be emulated with such a simple mapping. But as I said, I understand the limitations and that these features won't be available. I can live with that. Let's work on from there. This is all done, I'm not asking you to change that. What I ask of you is not to deal with localization in DSpace, the application. Don't import strings into the DSpace database. Instead, provide the infrastructure for translators to do their work easily. I'm asking you to do _less_ work, not more. You are have the developer hammer in hand and the translation issue seems like a nail to you. It's not. My proposal of the infrastructure is as follows: 1) convert messages.xml, Messages.properties and email templates to .pot 2) pull all the message catalogs of different modules in one place (git repo or website) 3) give translators write access to this place 4) optionally, push these catalogs to a web-based translation system 5) automatically pull the translated .po files from the translators, convert them back to localized source files (messages.xml, Messages.properties, email templates) and push them to the official dspace repo In more detail: 1) gettext is a standard. There's a whole ecosystems that enables translators to work with them easily, as they see fit. It's hard to update a .properties or .xml file reusing previous translations. It's easy with .po files. There already are tools to do the work. translate-toolkit converts .properties. xml2po por po4a convert xml files. Each email template is just one string to include in the .po file. 2) The proliferation of independent modules is a good thing. It's only hard to find files that need to be localized. 3) Make it easy. Make a simple git repo, with .pot files updated daily, which would hold the .po files. Generously give write access to this repo, there's nothing to here that would break DSpace. 4) Once you're using a standard format, you have access to a lot of existing tools that work with it. Web-based translation systems are easy to use for inexperienced translators. There's a plethora to choose from: Pootle, Transifex, Launchpad, translationproject.org just to mention a few. 5) You have a repository with translations in .po format. It's now easy to automate their conversion. By pushing them to the official repo, you're giving access to up-to-date translations to users. You want DSpace to be easy to translate. Make it easy for translators, give them a familiar format that they can work with with tools familiar to them. Lower the barriers of contributing regularly by generously giving them write access and by automating imports. Regards, ~~helix84 -- Better than sec? Nothing is better than sec when it comes to monitoring Big Data applications. Try Boundary one-second resolution app monitoring today. Free. http://p.sf.net/sfu/Boundary-dev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Localization inside config files?
On Tue, Apr 03, 2012 at 04:30:43PM +0200, helix84 wrote: Let me start over, constructively. I'll try to follow suit. [snip] What I ask of you is not to deal with localization in DSpace, the application. Don't import strings into the DSpace database. Instead, provide the infrastructure for translators to do their work easily. I'm asking you to do _less_ work, not more. You are have the developer hammer in hand and the translation issue seems like a nail to you. It's not. Actually I agree with all this. My proposal of the infrastructure is as follows: 1) convert messages.xml, Messages.properties and email templates to .pot Here's where I begin to disagree, at least in the case of Messages.properties. I don't see how converting from a format understood by DSpace *and* common translation tools, to one not understood by DSpace or any other Java code that I know of, helps us. messages.xml is specific to Cocoon, so maybe just supplying an appropriate XSL transform to some other format, and some kind of program to convert back, would be simplest. And yet, surely there are other Cocoon projects that would benefit from having common tools understand its catalog format. The effort of producing a converter might best instead be spent on developing format handlers for the common tools and contributing them to those projects. I think that email templates require a bit more thought. Just for example, we might want to replace DSpace's own code with an existing templating package. At any rate I'd like to hear some more discussion before we cram them into *any* other format. 2) pull all the message catalogs of different modules in one place (git repo or website) 3) give translators write access to this place 4) optionally, push these catalogs to a web-based translation system 5) automatically pull the translated .po files from the translators, convert them back to localized source files (messages.xml, Messages.properties, email templates) and push them to the official dspace repo Most of this was just proposed over on -devel. What do you think of it? http://sourceforge.net/mailarchive/message.php?msg_id=29028526 In more detail: 1) gettext is a standard. There's a whole ecosystems that enables translators to work with them easily, as they see fit. It's hard to update a .properties or .xml file reusing previous translations. It's easy with .po files. Java property bundles are a standard, and the ecosystems that understand gettext seem to understand them too. It would seem that in this case, ease of update is a tools problem, not a format problem, if it is a problem at all. [snip] 2) The proliferation of independent modules is a good thing. It's only hard to find files that need to be localized. That is something we need to fix. Anyone doing global translations should be able to refer to a well-maintained document which says, these are all the places that you need to look. Each separable component should document its localization needs. Guides for developing in DSpace should not fail to request internationalization and suggest how to go about it. 3) Make it easy. Make a simple git repo, with .pot files updated daily, which would hold the .po files. Generously give write access to this repo, there's nothing to here that would break DSpace. Reasonable, give or take my quibble about formats. 4) Once you're using a standard format, you have access to a lot of existing tools that work with it. Web-based translation systems are easy to use for inexperienced translators. There's a plethora to choose from: Pootle, Transifex, Launchpad, translationproject.org just to mention a few. Transifex was proposed over on -devel. It understands property bundles. [snip] You want DSpace to be easy to translate. Make it easy for translators, give them a familiar format that they can work with with tools familiar to them. Lower the barriers of contributing regularly by generously giving them write access and by automating imports. Perhaps I misunderstand how the tools work. It seems to me that a translator shouldn't be seeing file formats at all underneath the tooling. The translator should be working with keys and values, and the tooling should accept and produce whatever formats are required at the beginning and end of the process. -- Mark H. Wood, Lead System Programmer mw...@iupui.edu Asking whether markets are efficient is like asking whether people are smart. pgpFITtnIDUAk.pgp Description: PGP signature -- Better than sec? Nothing is better than sec when it comes to monitoring Big Data applications. Try Boundary one-second resolution app monitoring today. Free. http://p.sf.net/sfu/Boundary-dev2dev___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Localization inside config files?
I understand that as a developer, you don't want to involve any unnecessary transformations to other formats. KISS. Basically, there are two groups of translators: those who prefer to work with web-based translation systems and those who prefer to work directly with files. Both have good reasons. Let's call them web translators and file translators. (I happen to be the latter.) If you want to keep different formats, it would be transparent to web translators (assuming the web translation system understands the format). It would however be more complicated for file translators. This is in fact status quo. In particular, my biggest concern is that I don't know about a tool which would properly let you update .properties files with reusing previous translations as fuzzy. Part of the problem here is that .properties files do not recognize fuzzy strings, although this could be bolted on somehow, perhaps in form of comments. (What I usually do is either convert .properties to .po files - which is a very roundabout way of doing it; or I do a 3-way merge - which probably no other sane translator does) You mentioned that Cocoon's XML format is not the only option, I didn't know that. OTOH, I didn't mention, that while gettext is the most prominent, there are also other well supported formats - XLIFF and Qt .ts files - which happen to be XML-based. Perhaps XLIFF could be used in place of Cocoon's XML; as an advantage it has proper fuzzy strings support. You suggested that instead of a localization repo, the locations of all localizable files should be just mentioned in documentation. I agree that's a solution to part of the problem, but again, that preserves status quo and doesn't make it easy to contribute back regularly. It's easier for translators to do their work in a web interface (and do no other steps) or just commit their work than to submit a patch and wait for it to be commited. Believe me, I know, I do it all the time. Imagine your productivity as a developer if you had to submit your every patch via the tracker. Yes, it is possible to do so, but productivity will suffer and you're being treated as a second-class citizen. Do you want to treat translators as first-class citizens? You may be rewarded with better translations. I forgot to speak about one more concern that Christian raised - local overriding. There should be a way of having content localized into your language and then override it with customizations for your institution. I.e., for each catalog there should be fallback in the following order: English file, localized file, localized file for my installation. (You also get fallback with Gettext for free.) As you can see, although you can see translation as assigning value to a key, there are many more concerns like overriding, plural handling, fuzzy matching and certainly others even I didn't think about. (That's why I started with venting my frustration about reinventing the wheel.) I agree with you that dspace.name would be better off in a catalog if proper catalog overriding is in place. Regarding community and collection descriptions, I disagree. This is content rather than interface and should be treated inside DSpace. See the issue I filed: https://jira.duraspace.org/browse/DS-1134 Regards, ~~helix84 -- Better than sec? Nothing is better than sec when it comes to monitoring Big Data applications. Try Boundary one-second resolution app monitoring today. Free. http://p.sf.net/sfu/Boundary-dev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Localization inside config files?
On Sat, Mar 31, 2012 at 02:05:34PM +0200, Christian Völker wrote: [snip] Now I just found a new flavour of localization in the dspace/config/modules/curate.cfg file: #ui.tasknames = \ # profileformats = Profile Bitstream Formats, \ # requiredmetadata = Check for Required Metadata, \ # checklinks = Check Links in Metadata ui.tasknames = \ profileformats = Dateityp angehängter Dateien untersuchen, \ requiredmetadata = Pflichtfelder auf Inhalt überprüfen, \ checklinks = Links in Metadaten überprüfen # general = General Purpose Tasks, general = Allgemeine Aufgaben, #ui.statusmessages = \ #-3 = Unknown Task, \ #-2 = No Status Set, \ #-1 = Error, \ # 0 = Success, \ # 1 = Fail, \ # 2 = Skip, \ # other = Invalid Status ui.statusmessages = \ -3 = Unbekannte Aufgabe, \ -2 = Kein Zustand definiert, \ -1 = Fehlerhaft, \ 0 = Erfolgreich, \ 1 = Fehlgeschlagen, \ 2 = Übersprungen, \ other = Ungültiger Zustand Honestly, is this the way to go? Clearly not. We already have two different message catalog schemes, which IMHO is one too many. Configurable message texts should at least be confined to those two. It would be good to get every component to use a single scheme. Bedides the monsterous messages.xml file in modules/xmlui/src/main/webapp/i18n/ with more than 2.100 meanwhile, we already have numerous other places now, where to keep messages.xml files updated, in places such as dspace-xmlui/dspace-xmlui-api/src/main/resources/aspects/XMLWorkflow/i18n or dspace-discovery/dspace-discovery-xmlui-api/src/main/resources/aspects/Discovery/i18n Message catalogs will proliferate, because DSpace is becoming modular. Each module needs its own catalog, because it might be released on a different schedule, and because separable components shouldn't depend on each others' catalogs. Indeed it might be good to break up the modules/xmlui/src/main/webapp/i18n/messages.xml into more manageable chunks kept closer to the code that uses them, if there is a good and sensible way to do it. We do need to be careful to maintain consistency across modules, and to document as well as we can where to find localizable texts. [snip] I am sorely missing tool support I know of in other programming environments such as AppleGlot or QTLinguist. QTLinguist supports loading several files to compare and copy from and to each other, enabling something like a visual diff. Then, you can create your own dictionaries and load them. You get suggestions based on translations already finished which helps keeping consitency. The file structure cannot be damaged accidentally. Comments with alternative translations or reminders can be added for each message string. And you get an overview of the progress made by checkmarks in the sidebar for translations entered and translations reviewed. The only thing worse in this tool as compared to our files that it is a bit more complicated to find the place where the translation appears on the finished site, depending on the way programmers structured their work. I agree that good tooling would help. Localization requires a lot of comparison and systematic record-keeping, which are hard for humans but easy for machines. There is a proposal right now over on dspace-devel to use web-based localization tooling and services. I would invite anyone interested in localization to look it over and discuss. See the thread starting at Message-ID: CAGO4j2mtQ8Zp4fXA2WYJLinEi_aJDP17UU_hgDUMnw6=rqg...@mail.gmail.com, 24-Mar-2012, Chandan Kumar, Introduction. I would really like to see a system which stores all strings inside the database including all translations and adapted translations which override original trans- lations. If one could start translating through the admin interface, this would be a tremendous advantage over the current situation. I think this falls under the heading of having a tool is better than not. Before we spend a lot of effort to build our own localization tools, I think we should look closely at what is already available, hoping to let DSpace concentrate on what it is already good at. ('emacs' already exists; we don't need to create another one.) Storing the catalogs in the database brings its own set of problems: o Existing texts must be loaded into the database. So, we still need an external form, at least for loading and exporting texts. o Currently, when code changes cause significant reorganization of the message texts, a new release can simply replace the old catalogs with new ones. If we stick the texts into the database then we will need to find and clean out obsolete material, or provide code to clear the tables and reload. o We'll need to write a new catalog provider for java.util.ResourceBundle and a replacement for org.apache.cocoon.transformation.I18nTransformer, which look in the database
Re: [Dspace-tech] Localization inside config files?
I think Mark makes a number of good points here - esp. regarding modularity - and it's worth emphasizing that the net effect should be *less* localization effort, even if there are potentially more files, since one would only need to worry about the locally deployed modules - but I'm a bit puzzled about the 'single catalog scheme' as a desired future state. Without much thought, I can come up with 4-5 quite distinct sites (places, files, ways) where localization occurs in DSpace: * in email templates (config/email) * in dspace,cfg and many other config files (starting with the 'dspace.name' property) * in input_forms.xml * messages.xml and that ilk and I'm sure there are others; the curation stuff does not introduce a new locus of localization: localizability permeates the application already. It's also worth noting that localized strings occur not just in the UI proper - they can appear in RSS feeds, OAI-PMH harvests, etc So I'd be leery of a plan to shoehorn all localization into any single 'catalog scheme' , esp. one that is explicitly tied to a UI presentation layer. Having said all this, I sympathize with Christian's plight, and affirm with Mark that we can do a better job of managing it. Richard R. On Apr 2, 2012, at 9:21 AM, Mark H. Wood wrote: On Sat, Mar 31, 2012 at 02:05:34PM +0200, Christian Völker wrote: [snip] Now I just found a new flavour of localization in the dspace/config/modules/curate.cfg file: #ui.tasknames = \ # profileformats = Profile Bitstream Formats, \ # requiredmetadata = Check for Required Metadata, \ # checklinks = Check Links in Metadata ui.tasknames = \ profileformats = Dateityp angehängter Dateien untersuchen, \ requiredmetadata = Pflichtfelder auf Inhalt überprüfen, \ checklinks = Links in Metadaten überprüfen # general = General Purpose Tasks, general = Allgemeine Aufgaben, #ui.statusmessages = \ #-3 = Unknown Task, \ #-2 = No Status Set, \ #-1 = Error, \ # 0 = Success, \ # 1 = Fail, \ # 2 = Skip, \ # other = Invalid Status ui.statusmessages = \ -3 = Unbekannte Aufgabe, \ -2 = Kein Zustand definiert, \ -1 = Fehlerhaft, \ 0 = Erfolgreich, \ 1 = Fehlgeschlagen, \ 2 = Übersprungen, \ other = Ungültiger Zustand Honestly, is this the way to go? Clearly not. We already have two different message catalog schemes, which IMHO is one too many. Configurable message texts should at least be confined to those two. It would be good to get every component to use a single scheme. Bedides the monsterous messages.xml file in modules/xmlui/src/main/webapp/i18n/ with more than 2.100 meanwhile, we already have numerous other places now, where to keep messages.xml files updated, in places such as dspace-xmlui/dspace-xmlui-api/src/main/resources/aspects/XMLWorkflow/i18n or dspace-discovery/dspace-discovery-xmlui-api/src/main/resources/aspects/Discovery/i18n Message catalogs will proliferate, because DSpace is becoming modular. Each module needs its own catalog, because it might be released on a different schedule, and because separable components shouldn't depend on each others' catalogs. Indeed it might be good to break up the modules/xmlui/src/main/webapp/i18n/messages.xml into more manageable chunks kept closer to the code that uses them, if there is a good and sensible way to do it. We do need to be careful to maintain consistency across modules, and to document as well as we can where to find localizable texts. [snip] I am sorely missing tool support I know of in other programming environments such as AppleGlot or QTLinguist. QTLinguist supports loading several files to compare and copy from and to each other, enabling something like a visual diff. Then, you can create your own dictionaries and load them. You get suggestions based on translations already finished which helps keeping consitency. The file structure cannot be damaged accidentally. Comments with alternative translations or reminders can be added for each message string. And you get an overview of the progress made by checkmarks in the sidebar for translations entered and translations reviewed. The only thing worse in this tool as compared to our files that it is a bit more complicated to find the place where the translation appears on the finished site, depending on the way programmers structured their work. I agree that good tooling would help. Localization requires a lot of comparison and systematic record-keeping, which are hard for humans but easy for machines. There is a proposal right now over on dspace-devel to use web-based localization tooling and services. I would invite anyone interested in localization to look it over and discuss. See the thread starting at Message-ID: CAGO4j2mtQ8Zp4fXA2WYJLinEi_aJDP17UU_hgDUMnw6=rqg...@mail.gmail.com, 24-Mar-2012, Chandan Kumar, Introduction. I would really like to
Re: [Dspace-tech] Localization inside config files?
I agree that it shouldn't be exclusively in the UI layer. I do support the idea of, if not a centralized catalog where all values reside, then at least the ability to define a catalog that allows Repository Admins to easily override the existing values found in the files from the UI. In fact, IMO, it should really be in the database in a manner that, similar to configuration modularization would be separated up into contexts based on context. localization_catalog { context key value lang } Individual email templates (which are glorified parameterized messages), messages.xml and messages.properties all residing within one database table would shift the design of internationalization away from a developer activity. Simple tooling can be written to to dump/restore the database from properties files (xml or properties formats) as needed. Simple interfaces can be crafted in the admin area of the user interface to introduce simple editing of the field values and Caching can be employed in the XMLUI to optimize performance and reduce db queries for i18n tags. Mark Diggory (Schedule a Meeting (http://bit.ly/xNePTl)) 2888 Loker Avenue East, Suite 305, Carlsbad, CA. 92010 Esperantolaan 4, Heverlee 3001, Belgium http://www.atmire.com (http://www.atmire.com/) On Monday, April 2, 2012 at 11:02 AM, Richard Rodgers wrote: I think Mark makes a number of good points here - esp. regarding modularity - and it's worth emphasizing that the net effect should be *less* localization effort, even if there are potentially more files, since one would only need to worry about the locally deployed modules - but I'm a bit puzzled about the 'single catalog scheme' as a desired future state. Without much thought, I can come up with 4-5 quite distinct sites (places, files, ways) where localization occurs in DSpace: * in email templates (config/email) * in dspace,cfg and many other config files (starting with the 'dspace.name (http://dspace.name)' property) * in input_forms.xml * messages.xml and that ilk and I'm sure there are others; the curation stuff does not introduce a new locus of localization: localizability permeates the application already. It's also worth noting that localized strings occur not just in the UI proper - they can appear in RSS feeds, OAI-PMH harvests, etc So I'd be leery of a plan to shoehorn all localization into any single 'catalog scheme' , esp. one that is explicitly tied to a UI presentation layer. Having said all this, I sympathize with Christian's plight, and affirm with Mark that we can do a better job of managing it. Richard R. On Apr 2, 2012, at 9:21 AM, Mark H. Wood wrote: On Sat, Mar 31, 2012 at 02:05:34PM +0200, Christian Völker wrote: [snip] Now I just found a new flavour of localization in the dspace/config/modules/curate.cfg file: #ui.tasknames = \ # profileformats = Profile Bitstream Formats, \ # requiredmetadata = Check for Required Metadata, \ # checklinks = Check Links in Metadata ui.tasknames = \ profileformats = Dateityp angehängter Dateien untersuchen, \ requiredmetadata = Pflichtfelder auf Inhalt überprüfen, \ checklinks = Links in Metadaten überprüfen # general = General Purpose Tasks, general = Allgemeine Aufgaben, #ui.statusmessages = \ # -3 = Unknown Task, \ # -2 = No Status Set, \ # -1 = Error, \ # 0 = Success, \ # 1 = Fail, \ # 2 = Skip, \ # other = Invalid Status ui.statusmessages = \ -3 = Unbekannte Aufgabe, \ -2 = Kein Zustand definiert, \ -1 = Fehlerhaft, \ 0 = Erfolgreich, \ 1 = Fehlgeschlagen, \ 2 = Übersprungen, \ other = Ungültiger Zustand Honestly, is this the way to go? Clearly not. We already have two different message catalog schemes, which IMHO is one too many. Configurable message texts should at least be confined to those two. It would be good to get every component to use a single scheme. Bedides the monsterous messages.xml file in modules/xmlui/src/main/webapp/i18n/ with more than 2.100 meanwhile, we already have numerous other places now, where to keep messages.xml files updated, in places such as dspace-xmlui/dspace-xmlui-api/src/main/resources/aspects/XMLWorkflow/i18n or dspace-discovery/dspace-discovery-xmlui-api/src/main/resources/aspects/Discovery/i18n Message catalogs will proliferate, because DSpace is becoming modular. Each module needs its own catalog, because it might be released on a different schedule, and because separable components shouldn't depend on each others' catalogs. Indeed it might be good to break up the modules/xmlui/src/main/webapp/i18n/messages.xml into more manageable chunks kept closer to the code that uses them, if there is a good and sensible way to do it. We do need to be careful to maintain consistency across modules, and to document as well as we can