[aspell-devel] Re: ASpell

Barry Cavanaugh Thu, 11 Jan 2007 10:17:51 -0800

On 1/10/07, Kevin Atkinson <[EMAIL PROTECTED]> wrote:



Please post this to [EMAIL PROTECTED]  And I will respond there so
others can benefit from my response.

If you are not subscribed I will approve you mail within a day.


Edited and refined for public audience and clarity, as in no way was this

an attempt to show superiority. I made the suggestion in the hopes that it
may help you break out of a mold if possibly you were trapped in one. Since
you have felt it is worth repeating I will elaborate, and make my thoughts
more clear so that there is no misunderstanding between us or anyone else
for that matter.

I was downloading ASpell and looking at your project page. I have done some
coding in higher level languages and am good at figuring things out. I'm not
bragging, I'm just saying though I can be a bit of a duffus in many ways I
have found that I can solve problems others can't, or can sometimes offer a
fresh perspective. I am certain I do not have the experience to speak as an
authority

I was thinking about the desire to scan multiple languages somewhat
concurrently. Your statements showed you felt that the code was getting very
complicated to accomplish the task at hand. Hence as I thought about it, and
have seen in such cases is the need to simplify and modularize the code.
Doing so makes the desired results more fathomable and the seemingly
impossible becomes possible.

I'm shooting in the dark here but please bear with me. Your processes can be
grouped follows, identify misspelled words, find suggestions and presenting
the results. The presentation code needs to be handled completely separately
from your two main processes, checking for misspellings and offering
suggestions. Yes I know I am oversimplifying but this is necessary to
rethink the processes.

The text to be checked goes through the process with a "header", identifying
the language and from makes it possible for the sending process to reclaim
it and correctly handle its position as well as the results as respect the
submitted language. So the presentation layer is holding additional
information and needs to be restructured accordingly. Since this more
presentation structure by necessity becomes more complicated it in fact also
becomes somewhat more trivial to handle multiple languages at the same time.

In other words the spell check and suggestion process, even if combined, get
the data stream and the language concerned at the same time and then expects
to find the optimized processed data already in place. In other words the
data storage structure is also tagged with identifying information. The
optimized data tables are accessed or queried with the tagging information
also considered.

The idea is to streamline and dumb down the lookup procedure in a sense. I
guessing by the tone of your reply that you feel these processes can't be
separated, and come to think of it the extra information would not be
understood by the calling application. That of course does not include your
back end tagging and really then calls for additional tags that the calling
application may not understand.

Hence this is an advanced mode that can be dropped and the smart application
would call the process once per language. The language tag is included in
the response or dropped as necessary.  This way the receiving application
can then table the data appropriately. Your data optimization is language
specific and hence your working data for the most part then needs to be
split per language to be efficient and full featured. Your accomplishing
this in the back end makes it reasonable and possible for the calling
application to call ASpell on a per language bases and hence edit a multi
language document. Of course if the ASpell end gave consideration to primary
language and secondary language then the application could look much smarter
and approach the feature set of a single language document.

The spell check process should be fed the language, file and other
preference flags and it then marks misspellings. This process spell check is
then ready for the next file. Now here is why I want to break apart from the
suggested listings. The page could then be submitted for the secondary
language. You now have two masks for the same file. Remove all misspellings
that are correct in either one of the two languages. In the presentation you
highlight the language by color so that if the writer accidentally slips
into Russian in a English sentence or a misspelled English word mimics a
Russian he can identify it because the highlighting color changes. This
leaves open the possibility too of selecting the word and choosing "Check
Selection in English | Russian".

Now take all your misspelled words and get the suggested replacements in the
primary language, do the same with the secondary language and combine
without care as to which is which as that was never determined . You could
though, in your presentation data highlight the language surrounding,
proceeding or following. In other words stage one is leave the misspelled
words without the language of origin highlighting. Next you could make a few
rules as best guess what language and someday write a new algorithm that
syntactically considers the context.

Anyway it was neat to consider and I hope you found it entertaining,
Barry

_______________________________________________
Aspell-devel mailing list
Aspell-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/aspell-devel

[aspell-devel] Re: ASpell

Reply via email to