Re: [lingu-dev] [SoC] Grammar checker API

Joan Moratinos Tue, 06 Jun 2006 12:05:10 -0700

En/na Bruno Sant'Anna ha escrit:

On 5/24/06, *Joan Moratinos* <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:


    En/na Bruno Sant'Anna ha escrit:

     > Hi,
     >
     > Talking about this program with Carlos Menezes, we started to think
     > about summarizing the topics open in previous e-mails. Feel free to
     > comment ok?
     >
     > 1. Grammar Checker API, now:
     >
     >    1. It makes sense working with just one language now; so, foreign
     >       words in the text should be ignored.
     >    2. The grammar checker should run in a different thread to not
    block
     >       OpenOffice.
     >    3. The grammar checker should be able to check inside table cells,
     >       text headers and footers, enumerations and text boxes (Drawing
     >       Objects).
     >    4. The grammar checker should determine end of the sentences,
    because
     >       it is not so trivial (e.g., abbreviations). So, OpenOffice
    should
     >       just provide to the grammar checker an entire block of
    text, like
     >       a paragraph.

    For the automatic checking in the background:
    I have noticed that the Spanish grammar checker for MSWord tries to
    check everytime the user types a character that is a "candidate" for
    ending a sentence (for example, a dot). If the user goes on typing on
    the same paragraph, eventualy some fragments are checked again (it seems
    like there are "hard" ends, that can't be changed by the following text,
    and "soft" ends, that depend on the text that follows (for example, an
    abbreviation can appear at the end of the sentence or in the middle)). I
    think that we should check the grammar as soon as possible, not when all
    the paragraph has been typed.

As we discussed before, letting the OO determine the end of sentences isdifficult. I think the right time to start checking is after everyReturn Key press. Letting the grammar checkers analyse blocks is moresecure, the grammar checker can commit few mistakes when we act like it.


I think that we should try to do a state-of-the-art API. If MSWord tests
grammar after each period or question mark, why should we expect until
the end of the paragraph? The API shouldn't forbid "smart" checkers that
may exist in the future. Checking a paragraph that is being edited can
be difficult but not much more than spellchecking it. The spell ckecker
does it now, and does it well.

It's not incompatible to send to check sentence "candidates" and
also full paragraphs, or even the whole document to the checker. There
could be API calls for these different cases. The calls could be made
(from OpenOffice) at different times and priorities. For example, at a
very low priority the whole document could be checked. The checkers
could  then try to do a style check (that requires all the document
content). OpenOffice should know the capabilities of the grammar checkers.

     >    5. OpenOffice should be able to replace the wrong sentences.

    The checker should preserve formating, footnotes, etc. Ideally these
    things should not be passed to the checker (the footnotes and the like
    could be passed when the paragraph or the sentence that includes them
    has been checked, for example), but if the user chooses to accept a
    suggestion, the format (i.e. italics), the footnotes, etc. should remain
    in the original places. Perhaps we could pass "markers" embedded in the
    paragraph text and then return them in the corrected text to "align"
    the
    original and the checked sentences.

hum... I think API can deal with it, my idea is not letting grammarcheckers deal with these details, only analyse and suggests corrections.It could be difficult letting a grammar checker deal with indexes, textpositions, underlining etc.


The checker can have a function to check a sentence (clean). If the user
wants to replace the original text with a corrected one the API can ask
the checker to divide the corrected sentence in fragments corresponding
to fragments in the original sentence. The checker is allowed to ignore
the request, and even the OpenOffice could not use this function but it
should exist. No one knows better than the checker the correspondence
between the original and the corrected sentences. It's not difficult to
think abot cases very difficult to solve for the API, that doesn't has
enough information.

For example, my Catalan checker can correct the sentence "El tamany del
fitxer" ("The size of the file"; "tamany" is a barbarism) to "La mida
del fitxer". The corrected words can be very different from the original

ones, there can be more or less words or a change of order. Here is anexample of my proposal:

- The user has written "El *tamany* del fitxer", with "tamany" italized.
- OpenOffice submits "El tamany del fitxer" (a clean text).
- The checker tells that there is a mistake and that "El tamany" should
be replaced by "La mida".
- The user accepts the change.
- OpenOffice then asks the checker to divide "La mida del fitxer"
proportionally to 3/6/11, the lengths of the three portions ("El ",
"tamany" and " del fitxer").
- The checker responds with 3/4/11, the lengths of "La ", "mida" and
"del fitxer".
- OpenOffice can then write down the change without breaking the format.
- If the checker has not the capability of "aligning" the sentences,
OpenOffice can proceed by "trial and error".

     >    6. I think we should create an unified User Interface, for any
     >       grammar checker use it.

    I think that this user interface should be optional. A grammar checker
    is a candidate for great complexity and we should not be constrained to
    a predefined UI. For example, the grammar checker I'm developing
    (http://www.einescat.org) uses its own UI, and can be eventually used
    from clients other than OOo. For me (in my particular case) it would be
better not being bound to any user interface.
We have discussed it before, there is a problem, today every grammarchecker uses its own user interface, now imagine if you want to use twoor more grammar checkers in the same time, each grammar checker shouldhave its own UI? I think its not good. I know if we create a single userinterface it cannot allow a fine tuning in each grammar checker but I'mproposing a unified UI with most common options. We are open to discusshere ok?


I agree that there has to be a UI, but we could leave an "escape way"
for checkers that can not (or want not) use it. I think there are good
reasons for this:
- The checkers can be "application neutral". They can offer services to
several programs (OpenOffice, Word, Abiword, Thunderbird...). If the UI
is a part of the checker, the integration with the clients is easier.
- The checkers can be very complex. It's impossible to foresee every
need. For example, my checker shows the parse it has done with a
graphic. Will the UI permit drawing a graphic? If it's possible, I will
have to rewrite a part of my program. If not, the users will miss an
important part of my work.

JMo

     >    7. Automatic checking should run in background and marking the
    wrong
     >       sentences with a wavy line. It could be enabled and
    disabled, like
     >       Spell Checker.

    We should consider different colors for different usages (grammar

mistakes, style recommendations, etc.).


Can be for future  API =). I'll remeber this...

     >    8. The API should provide a paragraph (for example) to grammar
     >       checker and this one should return a list. If there is no
    mistake
     >       in this paragraph, the list should be empty,  else the list
    should
     >       contain:
     >          1. Where is the mistake in the paragraph (initial index
    + final
     >             index).
     >          2. A list of suggestions to correct that mistake (this
    list can
     >             be empty if checker is not prepared to guess).
     >          3. A comment about mistake, e.g. what a grammar book
    should say
     >             about it.

    A paragraph can contain several mistakes. We should proceed as in the
    spell checker. First the checker could return only the limits of the
    mistakes, so that OOo marks it. Only when the user asks for suggestions
    or explanations, should the checker provide it. Often the user will
    correct the mistakes without asking for suggestions nor explanations.

Yes, You are correct, the users may in several times just correct thesentences, but the process of analysing the Paragraph is processed justonce per change (after a change or a return key press, as I toldbefore). And a single check should provide all information regarding theblock analysed, IT not means that everything will be showed to the user,it will just be stored in some place (an object in memory) for the UserInterface deal with it.


     >
     >
     > 2. Grammar Checker API, future:
     >
     >    1. Let's suppose it's possible to manage several languages in
    a text
     >       and there is a Language Guessing API. Then, when OpenOffice
     >       discover language of a sentence, it automatically loads grammar
     >       checker to correspondent language.
     >    2. Optimize memory allocation, input/output and processing.
     >    3. Correct possible bugs.
     >
     >
     > Bruno Sant'Anna
     >

    ---------------------------------------------------------------------
    To unsubscribe, e-mail:
    [EMAIL PROTECTED]
    <mailto:[EMAIL PROTECTED]>
    For additional commands, e-mail:
    [EMAIL PROTECTED]
    <mailto:[EMAIL PROTECTED]>





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [lingu-dev] [SoC] Grammar checker API

Reply via email to