[lingu-dev] lost postings[2/5]: [SoC] Grammar checker API

Thomas Lange Thu, 01 Jun 2006 00:24:21 -0700

Hi,

> > I agree that determining the ends of sentences is non-trivial.
> > However, I think that this is a good reason to do it once in OOo
> > instead of each grammar checker having to figure it out manually. OOo
> > already maintains a list of abbreviations (ending with .), so
> > presumably this could be used. If the user adds custom abbreviations,
> > these would then automatically be picked up by the sentence splitter,
> > which wouldn't happen if each grammar checker implemented its own.


Just curious because I do not know about this:
Where should the user add custom abbreviations in order to get
recognized by the breakiterator?
Is there already sth. like that or is this a kind of proposal of yours?


>> >>    5. OpenOffice should be able to replace the wrong sentences.
>> >>    6. I think we should create an unified User Interface, for any
grammar
>> >>    checker use it.
> >
> > I vote for the unified UI. If necessary, have an Options button to
> > open a dialog box generated by the current grammar checker, but I
> > think the day-to-day operations of grammar checking should look the
> > same regardless of the back-end tools.

Reading the replies in the thread and thinking about the very different
set of options I've already seen in spell checkers I think having a
button that allows the grammar checker to show an option dialog of it's
own is likely the way to go here.


> > I haven't seen this mentioned explicitly, but I think there should be
> > a menu option Tools, Grammar Check to launch the grammar checker to
> > check through the whole document from beginning to end (as the spell
> > checker can do).

Just to be a little more precise:
It should start with the para first at the top of the current view then
going on and wrap-around if necessary.
If you are per chance in the mid of a large document you'd like to see
some immediate results.

> > Some people would prefer to turn off the wavy lines
> > and just use this mechanism to do the checks, so they are not
> > distracted when they are writing.

Agreed. Similar as it can be done for spell checking now.

>> >>    8. The API should provide a paragraph (for example) to grammar
checker
>> >>    and this one should return a list. If there is no mistake in this
>> >>    paragraph, the list should be empty,  else the list should contain:
>> >>       1. Where is the mistake in the paragraph (initial index + final
>> >>       index).
>> >>       2. A list of suggestions to correct that mistake (this list can
>> >>       be empty if checker is not prepared to guess).
>> >>       3. A comment about mistake, e.g. what a grammar book should say
>> >>       about it.
> > It might be a good idea to have two levels of comments -- one brief
> > and one detailed. The view of the detailed portion could then be
> > toggled on and off in the UI. Users could then see at a glance a
> > single sentence describing the problem. If they needed more
> > information, they could expand the view to show the detailed
> > explanation, which might include a more detailed explanation of why
> > the selected text is thought to be an error, examples of correct and
> > incorrect use, and references for further information.

This could also be done by giving that short description as sth. like a
header visually somewhat apart from the full description.

BTW: Do we all agree that it is sufficient to have that comment in the
language of the sentence being checked only?
That is if we have for example a French UI and check some English text
the comment should be only in English. And if the next sentence would be
German the comment will now be in German.

Otherwise it will probably get rather complicated here...

>> >>  2. Grammar Checker API, future:
>> >>
>> >>    1. Let's suppose it's possible to manage several languages in a
text
>> >>    and there is a Language Guessing API. Then, when OpenOffice
discover
>> >>    language of a sentence, it automatically loads grammar checker to
>> >>    correspondent language.
> >
> > As people have already mentioned, there may be more than one grammar
> > checker loaded for any given language. This is particularly likely
> > for my project: graviax. This tool is much simpler than most of the
> > others -- it uses regular expressions and doesn't attempt to parse
> > the sentences. However, because of this, it is easy for users to
> > update the rules to match their own preferences (for example,
> > publishers could create a rule set for their particular house style).
> >
> > Therefore, I would expect to have two English-language grammar
> > checkers running at the same time: a heavyweight checker that can
> > catch errors like "a red apples" (which is impossible to do reliably
> > in graviax) and then graviax running as well (for example, to
> > highlight cliches).

Just as info:
Currently it is possible for the spell checker to have more than one
implementation per language available.
A word is considered to be Ok if it gets accepted by any of those spell
checkers. (Also the order they get called can be defined in the UI.)


> > I'm not sure how this would work in terms of squiggly lines. It would
> > be nice if the user could set the colour of the line associated with
> > each tool independently, but that might be overkill.

I think the idea of chaining grammar checkers should be Ok as well.
Or am I missing something?
Of course doing so is likely to have a much more negative impact on
performance compared to spell checking where only a single word needs to
be checked twice.

So I somewhat wonder if we should allow it.
Several spell checkers won't be that bad. But if a user installs let's
say 5 grammar checkers because he/she wants the ultimately best grammar
checking this may result in serious performance problems when running in
  the background. Thus I'm somewhat unsure here.

Also what would be the rule to accept/reject a sentence?
Should all grammar checkers report the sentence as correct or would it
be sufficient if only one does so? The first is likely to be trouble for
performance.


>> >>    2. Optimize memory allocation, input/output and processing.  For
>> >>    example, OOo's spell checker uses a special dictionary, CoGrOO
(which is the
>> >>    only grammar checker that I know) uses another kind of
dictionary; we can
>> >>    create a unified dictionary but this implies modification in
spell checker.
>> >>    3. (Marcin Milkowski) Extend the user interface to provide more
>> >>    parameters to grammar checker, like colloquial, official and etc.
> >
> > This could be dealt with by allowing the user to disable rules (see
> > next point).

In the options UI that each grammar checker comes along with...



> > I hope we can create a grammar checker system that hands control back
> > to the users (so they can turn rules on and off) and actually helps
> > them to improve their writing by showing them where they have gone
> > wrong.

Agreed. The UI however should at least explicitly warn about possible
negative effects.


> > My suggestion would be to create the absolute simplest API to start
> > with. Use full stops to determine sentence breaks, even though some
> > of these will be wrong. Just add a menu item and dialog box for
> > working through the whole document -- don't do real-time checking
> > yet. Check whole document against a single grammar checker in a
> > single language.

I think the API should be up for the complete task and be defined in the
following weeks if possible. Otherwise we'll have about the same
discussion next year and maybe required to modify the API or maybe
several grammar checkers existing by that time as well.

Of course the actual first step integration of already existing grammar
checkers may use that assumption. It will just be a first step
approximation on the way to the final implementation.



One other question:
I once noticed when toying with the MS grammar checker that if there are
 too many errors in a text (e.g. because the language attribute is
inappropriate) it displays a message like "too much errors encountered.
Maybe it is foreign text..." and stops grammar checking and turns the
display of errors off.
I don't know the reason for this. Is it a possible performance problem
or do they just not want to have that much text marked as wrong?
The question is: Do we need something like this as well?


Regards,
Thomas



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[lingu-dev] lost postings[2/5]: [SoC] Grammar checker API

Reply via email to