Hi Thomas,
Hi Marcin,
The additional reason is that grammar checker could *really* need the
information about paragraph length (in many languages, too lengthy
paragraphs are considered bad writing style) and paragraph content (in
many languages, rhymes in the sentences that follow should be avoided if
it's not poetry; in Polish, repeating the same word in several sentences
in a row is considered a very bad writing style). Grammatik for
WordPerfect already detects paragraphs which are too short. I'm
currently thinking about implementing detector for the "do not repeat
same word" rule in Polish, your proposed approach would make this thing
really impossible. So this is not theory, this is how real world grammar
checkers work.
That would be perfectly in line with my latest suggestion to pass on
paragraphs but to specify the sentence (or whatever unit to be checked)
with an index to the start end end.
However in the above Polish scenario it seems to be likely that to check
the 8th sentence the paragraph needs to be checked from the beginning
again. :-/
Or we have to allow for some state information to be maintained by the
grammar checker (what in itself would be no problem at all because we
need not know about it). But I wonder how to make sure that this state
gets not disrupted by a call to the same grammar checker from a second
document (this could always happen because it can be accessed by anyone
at anytime using it's API).
Well, I think this kind of check should be run only interactively, and
selected with a special option in the checker. I think there are
languages and style registers where it's very important how many
different words are used (in technical writing, you are sometimes
required to use only approved wordings and a very limited vocabulary; or
in texts used for language education, etc.). Internal rhymes, words or
phrases repeated often belong to this class of problems. Yet they cannot
be detected using simple regular expressions on a single sentence level.
This group of tests is mostly statistical, and it should be run on a
completed version of the document (it makes no sense to check first
sentences). So it can and should be implemented using internal grammar
checker state. And probably the user should be prompted that it will
take some time.
So I think after all, that read/write access to a single sentence at a
time and read access to the whole document is quite enough.
You should however make very clear to the implementers of grammar
checkers how the API uses the abbreviations list to segment sentences.
BTW, you should break the sentence as well at the end of the paragraph
(think of titles - in most languages, they are not followed by any
punctuation).
Best,
Marcin
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]