Hello Brother

I am impressed that you achieve 80% accuracy with this system.

I suppose the line "Ioann=Ioann" must be "Ioann=Ioánn"

I also suppose "hosti==hósti" has to be corrected into "hosti=hósti"

In the recent Vatican editions, "Míchaël" is written "Míchael" but in that case, after the replacement of "ae" by "æ" at the end ot the processing, some corrections will have to follow, like "Míchæl" back into "Míchael".

I see your program foresees the "j" as well as the "i", like in "eius" and "ejus". So it could work on the Vulgata and the Nova Vulgata. Is that right?

Do you think it is possible to improve in a significant way the accuracy of the system, or do you think you reached the limit? Anyway, I am ready to help as far as I can. Some cases will always make a human intervention necessary, like "ténere" (tenderly) and "tenére" (to hold). You also pointed out "advenit".

I wonder what will happen with "coegit" v.gr.  where "oe" cannot be changed into "œ".

Anyway, we have to keep in touch for this matter. Kind regards.

Fr. Pierre

On 02/22/2014 03:13 AM, Brother Gabriel-Marie wrote:
Hello, y'all.

I've dabbled in this, and have an effective method that is about 80% accurate.

It does a sequential find and replace, replacing certain combinations of letters first, then other combinations afterward.   The list of latin words/particles is in a very particular order, and still needs a good bit of tweaking.

It is actually part of a program I have been writing in my free time.  Since it is set up in an ini file, you should be able to easy reproduce the search and replace in whatever language you like.  If you improve it, however, I want to be involved, please!  I have attached the file:  Latin.ini

-BGM


On 2/19/2014 8:23 AM, Benjamin Bloomfield wrote:
For some words, it is easy to tell that the penultimate syllable is long, and should therefore be accented (e.g., adventus because -ven- ends in a consonant, and if the penultimate vowel were a dipthong (au, æ, œ) that would make the syllable long as well.)  The real trick would be to have a list of words whose penultimate syllable is never long, and one of words that always have a long vowel in the penultimate syllable (e.g., advenit is ambiguous because has a long e if it is in the perfect tense, and a short e in the present tense).  If anyone could get such lists of Latin words together, I could write a script to add accents to all the words whose accent is unambiguous, and then list all the 3+ syllable words whose accent would need to be determined by the context.

Does anyone have an accented Latin word list of any kind, though?  Even if it were just a list of every Latin word with accent marked, or with vowel lengths marked, I could write a script to extract the 3+ syllable words into their proper lists when they are not ambiguously accented words like advenit.

I could probably figure out a way to download a list of all the Latin words contained in Wiktionary, but I'm not sure how accurate or complete that would be.

Benjamin Bloomfield


On Wed, Feb 19, 2014 at 6:40 AM, Innocent Smith <[email protected]> wrote:
Dear Gregorio Users,

I'm experimenting with using an OCR program to extract liturgical texts from a PDF of a Latin Missal, for various purposes including setting texts with Gregorio. With the software I have available, I am having difficulty doing the OCR in a way that preserves the accents accurately.

Is anyone aware of automated ways to take a Latin text that does not have accents and to add them in?

Yours,

bro. Innocent, op

_______________________________________________
Gregorio-users mailing list
[email protected]
https://mail.gna.org/listinfo/gregorio-users




_______________________________________________
Gregorio-users mailing list
[email protected]
https://mail.gna.org/listinfo/gregorio-users



_______________________________________________
Gregorio-users mailing list
[email protected]
https://mail.gna.org/listinfo/gregorio-users


--
Father Pierre FRANÇOIS (http://www.romanliturgy.org)
Bosmanslei 16
B-2018 Antwerpen (Belgium)
mobile: +32 474 719 131
phone: +32 3 237 63 96


_______________________________________________
Gregorio-users mailing list
[email protected]
https://mail.gna.org/listinfo/gregorio-users

Reply via email to