On 03/05/2012 08:07 AM, Peter von Kaehne wrote:
On 05/03/12 17:33, Greg Hellings wrote:On Mon, Mar 5, 2012 at 11:28 AM, Kahunapule Michael Johnson <[email protected]> wrote:On 03/05/2012 03:20 AM, Greg Hellings wrote:You seem quite taken with USFM, but remember that CrossWire and SWORD do not support USFM as an import or display format. Therefore information beyond just how to convert USFM into OSIS or ThML or GBF which are supported is not really of importance.USFM is the format that literally hundreds of minority-language Bible translations exists in. Are you saying that the Sword Project is not interested in importing those?I am not entirely clear what you are aiming at and I must say I do get somewhat irritated with your tone. I do have a feeling over the last few days that you are itching to get a fight. Why is that? Is this simply a misunderstanding? It is most likely a misunderstanding. Perhaps I have also been misunderstanding some of the messages that seem to be opposed to USFM. I'm not trying to suggest that USFM be made an additional internal format for Sword for Bible search and display, like GBF and OSIS. Please let me be clear about what my goals, agenda, and purpose really are. I have many USFM Bible texts in many languages. I will soon have access to many more. I would like to convert them to various formats for distribution and use, publishing them in ways that maximize their usefulness and accessibility and study by many people in their own languages. My primary focus is with minority languages, although I have a few translations in languages that have many more speakers that I will be converting. Sword is one of many possible outputs for these Scriptures. Because of the large number of translations involved, and frequent updates in the case of translations in progress, I'm not interested in manual processes. I am only interested in automated processes that are reasonably efficient and very reliable. As far as I'm concerned, it doesn't matter to me what formats you store or display Bibles in. It can be the current Sword format set defined by your API. It can be COBOL code and structured Latin if you can make it work. What I do care about is that when I convert a Bible (or portion) translation into one of your import formats, and you import it and display it, that:
I don't care if the Sword Project ever supports USFM in any way except to import it, directly or indirectly through OSIS or another format, into Sword. I never suggested using USFM or its XML kin in any other way within the Sword project. I don't care how you display USFM on your web sites, wiki or otherwise, or what formats you use internally to the Sword project, as long as it works end to end without losing a single jot or tittle. However, I do think it is important that you document the best ways to convert USFM to a format you can import. I think you do, too, really. I am aware that you have some tools to import a small subset of USFM to a form of OSIS that works with osis2mod, and have created some modules with it. I'm also aware of the OSIS manual section that contains a list of OSIS near equivalents for most (but not all) of the current USFM tags that actually appear in the Bibles I'm working with. My tests using those tools so far have found them wanting. I'm going to try to fix that by doing my own conversion from USFM to OSIS. Please forgive what may have appeared to be criticism without a constructive purpose. I'm trying to convert Scripture files on a scale and with speed that is apparently unprecedented. I intend to write a USFX-to-OSIS converter that produces output that should validate against the current OSIS schema, and which will import correctly into Sword modules. (GBF might be an option, too, but I think that if the difficulties with OSIS can be overcome, it would be better to use OSIS.) At least that is what I'm going to try. If I succeed, you need not deal with USFM and its XML kin directly ever at any time. You can just send people to a different open source project for that piece of important functionality. There are some things that I will do that may not fit the way some members of the committee that designed OSIS envisioned things. For example, in the OSIS files that I generate, all of the quotation punctuation will be left as part of the Bible text, and never included in a <q> marker, either implicitly or explicitly with a "marker" attribute. If I need to mark direct quotes of Jesus Christ in a particular translation, I will do so by converting USFM \wj ...\wj* markers directly into <q who="Jesus" marker="" sID=""/>...<q marker="" eID=""/>, where the marker attribute is always empty. This should, according to http://crosswire.org/wiki/OSIS_Bibles#Marking_Quotations, result in lossless display of the proper quotation punctuation in all front ends that comply with that same interpretation. I don't plan to use <q> for anything other than direct quotes of Jesus. This usage is philosophically compatible with USFM and OXES. It is also actually easier to render, since the Paratext interpretation of USFM does not allow \wj ...\wj* markers to cross verse boundaries. Therefore, you don't have to process beyond the beginning of the current verse to determine if you should turn on an optional red attribute or not, even in an extended quotation like The Sermon on the Mount. Another thing I will do is convert legacy (deprecated) "display" markup for bold and italics directly from USFM to <hi type="bold"> and <hi type="italic"> markers. The reason for that is that I have translations where I have tried to replace "display" markup with the appropriate "semantic" markup, only to find that USFM does not have a suitable replacement for the way certain translators have chosen to use these text attributes. Fidelity to the translation and deference to the translation committees wins out over abstract arguments about separation of semantics from presentation forms. In essence, these attributes that are considered in some languages to be mostly a presentation issue are actually a semantic issue in other languages. This is not a winable argument, so I just perpetuate the use of this kind of markup and hope that front ends will honor that markup. The consequences of not doing so are presentation of writing that is less clear and ugly in the subject languages. There may also be cases where I preserve the bold and italic markup just because it is too time-consuming to try to figure out what it should have been in each case, based on where it is, but in a language I can't read. I hope this helps... Shalom, Michael |
_______________________________________________ sword-devel mailing list: [email protected] http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
