Re: [lingu-dev] [SoC] Grammar checker API

Thomas Lange Fri, 26 May 2006 06:23:14 -0700

Hello Bruno,

Well, first things first:
Congratulations for being accepted as on of the projects for the Google
Summer of Code! :-)


->Lacci: Hi, Lacci. I'm not sure if you already noticed that we have
started a dsicussion about grammar checking an API and not last to be
mentioned integration of grammar checkers in OOo.
The focus should currently be on the integration (i.e. whow will it look
like to the user in the end?) especially if there are more than one
grammar checkers available. I think this should be the first topic
because we need to make clear where we want to go and identify the
problems on the way before deciding on an API.
So if you have time I would be glad if you can share your thoughts.


> 1. Grammar Checker API, now:
> 
>    1. It makes sense working with just one language now; so, foreign
>       words in the text should be ignored.

>From the API view agreed!

>From the UI view I'm a bit unsure here. Since currently different
languages in one sentence being spell checked is working it looks a bit
like a regression from the users point of view if that text would just
be skipped.


>    2. The grammar checker should run in a different thread to not block
>       OpenOffice.

You mean when grammar checking is done automatically (in the background
like automatic spell checking) only?

>    3. The grammar checker should be able to check inside table cells,
>       text headers and footers, enumerations and text boxes (Drawing
>       Objects).

Sure.
The question is should it be able to do so because it knows of the
existence of such objects and is able to retrieve/modify those on it's
own? Or should the existence of such objects be completely hidden to the
grammar checker? For example by means of an abstract API to iterate
through and modify the text of a document.
And pushing that question one step further:
Is the grammar checkers implementation to iterate through the text or
should there be a different object that iterates through the text and
calls the grammar checker to process it?

>    4. The grammar checker should determine end of the sentences, because
>       it is not so trivial (e.g., abbreviations). So, OpenOffice should
>       just provide to the grammar checker an entire block of text, like
>       a paragraph.

Doing it this way would of cause be easiest from the applications view.
First it does not need to determine the end of a sentence and secondly
paragraphs are the easiest units to access.

But I somewhat doubt the ability of a grammar to identify the end of
sentence in a mixed language text. For example if an English grammar
checker encounters the upside-down question-mark following the Spanish
word at the end. Thus I'm wondering if the API should allow for a
suggested-end-of-sentence when calling the grammar checker. Thus if the
implementation encounters unknown characters it has at least a hint.

BTW: The I18N break-iterator is not that bad with abbreviations. I think
it has a list of those. But citations and similar things might pose a
huge problem to it.

And another question would be:
Having the grammar checker being called with sentences, does it mean
when an error is found the whole paragraph is presented to the user
(could be really large!) or does the UI only display the sentence of
where the error occurred?

Displaying less than a sentence seems somewhat bad to me because
sometimes the user will possibly like to solve an error by rearranging
the sentence. And quiting the UI because only the wrong word was
displayed seems to be annoying. And allowing the original document to be
modified parallel to the dialog being display may be somewhat
troublesome to implement.


>    5. OpenOffice should be able to replace the wrong sentences.

;-)

>    6. I think we should create an unified User Interface, for any
>       grammar checker use it.

+1.

Of course this will not prevent someones grammar checker to come along
with it's own UI.
It only makes the implementation easier if the UI is already there and
to the user all the grammar checker will look the same. Thus avoiding a
possible source of confusion.

>    7. Automatic checking should run in background and marking the wrong
>       sentences with a wavy line. It could be enabled and disabled, like
>       Spell Checker.

+1.
Someone once mentioned the idea of at least two different kind of lines.
One for what the grammar checker knows for sure is wrong. And the other
one for "this is probably wrong" (e.g. outdated words like "thy" or
"thee" in English). This of course going along with an option that
allows the user to specify if he likes to have both types displayed or
only the I'm-100%-sure-it-is-wrong parts.
The reasoning was AFAIR that it is most annoying to the user to get
errors reported that are no errors.
I found that idea quite compelling...


>    8. The API should provide a paragraph (for example) to grammar
>       checker and this one should return a list. If there is no mistake
>       in this paragraph, the list should be empty,  else the list should
>       contain:

A list of what?
Suggestions on how to correct the first encountered error?

Or did you meant a list of all errors? Or even sth else?


>          1. Where is the mistake in the paragraph (initial index + final
>             index).
>          2. A list of suggestions to correct that mistake (this list can
>             be empty if checker is not prepared to guess).
>          3. A comment about mistake, e.g. what a grammar book should say
>             about it.

Having listed point 1. here as part of the list seems to suggest that a
list of all errors was meant to be returned...
When I talked about this to people implementing grammar checkers last
year all of them said to stop at the first error. Since when that error
was corrected the whole sentence will have to be checked again.
Thus there would be no need for further errors.
Also (as sometimes happen with compilers) consider one single error to
trigger reports of several errors following it. If that one gets fixed
all the other ones will vanish as well. Thus the list may already be
obsolete when the first error got fixed.


> 2. Grammar Checker API, future:
> 
>    1. Let's suppose it's possible to manage several languages in a text
>       and there is a Language Guessing API. Then, when OpenOffice
>       discover language of a sentence, it automatically loads grammar
>       checker to correspondent language.

Here it is a bit like the snake biting it's tail:
How is the language guessing to be presented with a sentence to operate
on (in order to define which grammar checker is to be used), when the
grammar checker is already required to identify the end of the sentence?

Either it is only guessing the language of the paragraph, which may
constitute of several complete-sentences-in-various-languages. Or we
still need the I18N breakiterator (or sth similar) to identify the sentence.


Regards,
Thomas

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [lingu-dev] [SoC] Grammar checker API

Reply via email to