Re: [lingu-dev] [SoC] Grammar checker API

Marcin Miłkowski Mon, 05 Jun 2006 08:38:50 -0700

Hi Thomas,

Hi Marcin,
The additional reason is that grammar checker could *really* need theinformation about paragraph length (in many languages, too lengthyparagraphs are considered bad writing style) and paragraph content (inmany languages, rhymes in the sentences that follow should be avoided ifit's not poetry; in Polish, repeating the same word in several sentencesin a row is considered a very bad writing style). Grammatik forWordPerfect already detects paragraphs which are too short. I'mcurrently thinking about implementing detector for the "do not repeatsame word" rule in Polish, your proposed approach would make this thingreally impossible. So this is not theory, this is how real world grammarcheckers work.
That would be perfectly in line with my latest suggestion to pass on
paragraphs but to specify the sentence (or whatever unit to be checked)
with an index to the start end end.

However in the above Polish scenario it seems to be likely that to check
the 8th sentence the paragraph needs to be checked from the beginning
again. :-/
Or we have to allow for some state information to be maintained by the
grammar checker (what in itself would be no problem at all because we
need not know about it). But I wonder how to make sure that this state
gets not disrupted by a call to the same grammar checker from a second
document (this could always happen because it can be accessed by anyone
at anytime using it's API).

Well, I think this kind of check should be run only interactively, andselected with a special option in the checker. I think there arelanguages and style registers where it's very important how manydifferent words are used (in technical writing, you are sometimesrequired to use only approved wordings and a very limited vocabulary; orin texts used for language education, etc.). Internal rhymes, words orphrases repeated often belong to this class of problems. Yet they cannotbe detected using simple regular expressions on a single sentence level.This group of tests is mostly statistical, and it should be run on acompleted version of the document (it makes no sense to check firstsentences). So it can and should be implemented using internal grammarchecker state. And probably the user should be prompted that it willtake some time.

So I think after all, that read/write access to a single sentence at atime and read access to the whole document is quite enough.

You should however make very clear to the implementers of grammarcheckers how the API uses the abbreviations list to segment sentences.BTW, you should break the sentence as well at the end of the paragraph(think of titles - in most languages, they are not followed by anypunctuation).


Best,
Marcin

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [lingu-dev] [SoC] Grammar checker API

Reply via email to