Re: [lingu-dev] [SoC] Grammar checker API

Bruno Sant'Anna Tue, 13 Jun 2006 05:43:22 -0700

Hi, ^^

On 6/12/06, Thomas Lange <[EMAIL PROTECTED]> wrote:

Hi Bruno,

> I send an e-mail today for this list, talking about it, I think
> santiago's idea was pretty good about enabling an option for enabling or
> disabling the sentence division.

I still think sentence division is a thing every grammar check should
be able to do. (Or if not he has to accept the provided suggestion.)

Yes I think this too, I mentioned about this option not thinking in grammar checker capacity (of spliting) but thinking in API sentence division, it would be great if in the future we have language guesser and the API could send it directly to the right checker, for me its a NICE to HAVE.

Thus I'm still not sure about the advantage of such an option.
But if you think it is good just move on...

> Another thing I'm thinking about, should automatic checking always use
> sentence method? I think yes since the user don't need to finish a
> paragraph for autochecking start, what do you think?

Would be acceptable to me.
But keep in mind that the most single thing the user wants is that the
very same text parts an interactive grammar check (starting with that
paragraph) would find will get marked by the automatic grammar checking.
Everything else will be quite irritating and probably result in
automatic grammar checking being thought of as more or less useless.

If the user ask Interactive checking in a paragraph we could reset all errors in this paragraph (found by automatic checking) and recheck it again with interactive checking (Sending the whole paragraph to grammar checker) what do you think?
Normally autochecking is done automatically and a user starts interactive checking when he finishes his text, so auto-checking would be a text development part (since it helps a user when he is writing) and interactive-checking would be a text revision part (since it provides detailed descriptions about rules). For this reason I think that both of them are useful.
Correct me if I'm wrong ok?

> I still think that managing multi language now is too fine grained,
> would you and Mathias mind if we implement it in future? (changing the
> goals you told me about...)

How about this:
- assume that at least every single sentence has a single or
primary language and that grammar checking takes place for
that language.
(If there are any foreign words i.e. words that have a different
language attribute(!) those will only get spellchecked by the
regular spell checking process.)
- This requires that for each sentences with mixed language
attributes somehow that primary language is determined.
- Within a paragraph it should still be possible to mix sentences
of different languages (meaning actually having a different language
attribute) and sentences with more than one language.
But each sentence gets grammar checked only in the primary language
- To make compromises with this approach one should probably have the
option in the UI that the sentence currently being checked should
be now checked by the grammar checker of a specific language.

How does this sound?
This may of course not be the final-model but I think we can
accept this as the suggested limited model for this SoC.
Later on when we have actual implementations for this we can
actually try them out and look for the more specific problems
of multi languages in one sentence.
Please comment!

It sounds better and possible now, bus some points are unclear to me:

- How we will define a language atribute of a word? It already exists in UNO?

- How we will determine a "main language" of a sentence, we are supposing a language guesser?

Thomas->Bruno:
Mathias and I have talked about the model to use and some other details.
The results are as following:
- Your dummy implementation should use C or C++ to avoid the overhead
of involving a UNO bridge for a different language binding.

Fine, It should be faster.

- Sine there was no discussion taking place for the pros and cons
of the actual models of iteration to use which were
   1) have it done by each applications core
      similar to current spell checking
   2) have everything done by the component that comes along
      with the grammar checker (as currently be done by CoGrOO)
and
   3) having a mediating object that takes care of iterating through
      the document, having it check the text by the grammar checker,
      raising a dialog to edit the text if necessary and writing the
      modified text back to the document.
we took this in our hands.
We agreed to use the model with the separate object that calls the
actual grammar checker and obtains the paragraphs to be checked
from the document. Also for the dialog to modify the text: it should
be a different implementation with an API of it's own in order
to have the UI properly separated from the grammar checker and
iteration object.

YES! Great, we (me and menezes) were discussing this before, we divided the whole process in 3 components:

- API that provides text blocks and does the user interface.

- Grammar checker that receives pure strings, checks it and return an object.

- A middleware we called "driver" similar whats done with databases. It function will be connecting the API to grammar checker (or grammar checkers). This is specially useful since each developper can create it own driver to work with it own grammar checker, not needing to rewrite the grammar checker. What do you think about it?

We think the initial sequence should be something like this:
-- The document should notify the iterating object that there is
    sth. to be done by providing the paragraph to check (e.g. via
    an XTextRange or some other UNO interface that allows text access).
    Having that very paragraph processed by means of calling the
    grammar checker API and maybe the dialog. The iterating object
    (let's call it for short Iterator from now on) should ask the
    document for the next paragraph and so on until the document is
    processed.
    (The inter paragraph sentence iteration should be done by the
    Iterator, of course having the respective grammar checker
    determine the end of sentence where ever possible).
    By having the Iterator asking for the next paragraph instead of
    the document pushing all the paragraphs to the Iterator we limit
    the possibility of piling up paragraphs to be checked and possibly
    being already deleted/moved etc. when their turn comes up.
    Of course the problem is potential still available there but it
    should be a bit less likely to actually happen.

You mean create an spool of paragraphs to be checked? It sounds ok, it should be possible since we are dealing with interactive and we can manage the actual paragraph completely, it is a state machine in my point of view. Let me complement with this pseudo algorithm:

send a paragraph for checking and wait a while for results (we have to manage exceptions here).

create a list of error objects (if it find errors)

while (error)

{

show the wrong sentence to user;

show a guessing if its possible;

show a detailed rule comment if its possible;

ask the user about correct or ignore;

if user change

{

if guessing exists

{ replace sentence with the gessing }

else (if guessing not exists)

{ ask a new sentence to user and replace the previous sentece with it }

}

else (if user chooses to ignore)

{

// we somehow have to flag this sentence in case of rechecking,

// it should probably bores the user if he asks to ignore and latter

// he recheck int again and it pop again... my idea is creating an list of

// error objects and this object has a boolean that shows if it was aready

// checked.

ignore and go to next error;

}

}// this loop is applicable just for a paragraph

when this loop finishes it goes to next paragraph.

What do you think?

Side note for the UI: Please have the API that way that the dialog
gets invoked only once at start of interactive checking and stays open
until the document is processed or it manually closed.
The case to avoid is to have pop open and being closed for each paragraph.

ok...

Please think about the above and comment about it.
If you agree please start to think about the API for this kind of setting.

comments above

BTW how are you doing with the API proposal for the actual grammar checker?

I studied the examples a lot and was playing with XTextcursor and XTextProperties, I studied also tried to create a component based upon openoffice wiki (hackers guide), but unfortunately it broke my source tree (I was working with OOA680_m1)... so on saturday I downloaded a new source code and tried to compile, It was painful I could not compile (still trying). In the worst case I have a backup of my old tree in USP (cogroo lab). I can restore it on thursday.

As far as I see nothing of the above will limited the suggested
abilities of the grammar checker. Everything should still be fine.
->everyone else: Please feel free to comment!

Thanks!
Thomas

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Thanks a lot

Bruno Sant'Anna

Re: [lingu-dev] [SoC] Grammar checker API

Reply via email to