Re: [lingu-dev] [SoC] Grammar checker API

Marcin Miłkowski Thu, 01 Jun 2006 05:18:27 -0700

Bruno Sant'Anna wrote:

I know splitting theparagraph into sentences is not trivial but I sincerely think that thisway is better than sending the full paragraph when we are dealing withmore than one language.

Why not using the language attribute for decided which grammar checkershould receive the text and the span of the text? As I mentioned before,single words are not really important here, because they really are notin another language, and the only reason to mark them up as being inanother language is spell checking which is not the same as grammarchecking.


So what you could simply do is to implement the following behavior:

1. If the paragraph is in one language, send it to the grammar checker.

2. If the paragraph contains foreign chunks, send them to theappropriate grammar checker, if any, possibly setting the API flag"this_is_interspersed_with_another_language".


This should be also quite fast.

The additional reason is that grammar checker could *really* need theinformation about paragraph length (in many languages, too lengthyparagraphs are considered bad writing style) and paragraph content (inmany languages, rhymes in the sentences that follow should be avoided ifit's not poetry; in Polish, repeating the same word in several sentencesin a row is considered a very bad writing style). Grammatik forWordPerfect already detects paragraphs which are too short. I'mcurrently thinking about implementing detector for the "do not repeatsame word" rule in Polish, your proposed approach would make this thingreally impossible. So this is not theory, this is how real world grammarcheckers work.

BTW, multilingual documents are really less common, believe me, tryGoogle ;)


Regards,
Marcin

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [lingu-dev] [SoC] Grammar checker API

Reply via email to