Re: [lingu-dev] Proofreader: intercept Ignore

2010-09-10 Thread Marcin Miłkowski
Hi William, W dniu 2010-09-10 07:43, Thomas Lange pisze: Hi William, On 09.09.2010 18:53, William Colen wrote: Hi, Is there a way to intercept when the user clicks ignore button? I'm implementing usage feedback in Cogroo grammar checker, and know when the user ignore an error would be a very

Re: [lingu-dev] NUMBERTEXT.org project for ODF and OpenOffice.org

2009-09-01 Thread Marcin Miłkowski
functions in Soros. Regards Marcin Regards, László 2009/8/29 Marcin Miłkowski milek...@o2.pl: Hi Laci, this is indeed very nice and quite easy to fix (even for Polish where it's quite involved). Here's my bug report: https://bugs.launchpad.net/numbertext/+bug/421031 Regards Marcin Németh

Re: [lingu-dev] NUMBERTEXT.org project for ODF and OpenOffice.org

2009-08-29 Thread Marcin Miłkowski
Hi Laci, this is indeed very nice and quite easy to fix (even for Polish where it's quite involved). Here's my bug report: https://bugs.launchpad.net/numbertext/+bug/421031 Regards Marcin Németh László pisze: Hi Sophie and Olivier, I have updated the site, also with Olivier's patches for

Re: [lingu-dev] Wordlists for specific industries

2009-05-27 Thread Marcin Miłkowski
Mathias Bauer pisze: Marcin Miłkowski wrote: Mathias Bauer pisze: Marcin Miłkowski wrote: Mathias Bauer pisze: Russell Butler wrote: Mathias Bauer wrote: Thomas Lange - Sun Germany - ham02 - Hamburg wrote: b) If you list is larger and especially if you want to provide that word-list

[lingu-dev] ANN: LanguageTool 0.9.8

2009-04-28 Thread Marcin Miłkowski
LanguageTool 0.9.8 has just been released at www.languagetool.org. Some of the changes include: * Fixed a crash * New rules for Italian * Many new rules, a rule-based disambiguator and synthesiser for Romanian * Initial support for Slovak * Small fixes and additions for Polish and English The

Re: [lingu-dev] About proofreader and spell checker interaction

2009-04-24 Thread Marcin Miłkowski
Hi all, and Thomas :) I guess nobody else comments, as these things are highly technical, but they're of interest to the list. So let me continue our exchange :) [snip] If want you meant here was chaining of grammar checkers than that probably will never happen. Currently there is only one

Re: [lingu-dev] misleading New Features document

2009-03-17 Thread Marcin Miłkowski
is the status of the bundle in such a case). Regards Marcin Regards, Carlos Menezes 2009/3/16, Marcin Miłkowski milek...@o2.pl: Andrea Pescetti pisze: Daniel Naber wrote: http://www.openoffice.org/dev_docs/features/3.1/ says: With OpenOffice.org 3.1, the LanguageTool grammar checker is now also

Re: [lingu-dev] Thesaurus question

2009-03-06 Thread Marcin Miłkowski
Thomas Lange - Sun Germany - ham02 - Hamburg pisze: Hi, I just found this in the English-US thesaurus: dark has the antonym light, but light has the antonym heavy Well, of course it isn't wrong. But maybe it is not what one would expect either. Thus, why is it this way? Does the thesaurus

Re: [lingu-dev] How to get list of valid word in hunspell

2009-03-02 Thread Marcin Miłkowski
ge pisze: Hi, Jeje, The munch and unmunch utilites help to get all valid words; you must provide affix and dic file, and they create all valid words. I am not sure, where they are located right now, they were part of myspell, that is now replaced by hunspell. maybe http://hunspell.sf.net

Re: [lingu-dev] Hunspell: about suggesting the right spelling

2009-02-24 Thread Marcin Miłkowski
Olivier R. pisze: Hi, I would like to understand how hunspell tries to suggest the right spelling. Here is some examples of the strange behaviour we get: * example 1 * _déterrer_ is the correct spelling of a verb (to dig up in English) a. If I write: _détérer_ Hunspell suggests:

Re: [lingu-dev] hunspell dictionary extension by Google

2009-02-15 Thread Marcin Miłkowski
Hi, note for other dictionary developers: in case of some languages, they seem to use antiquated versions. For Polish, the dictionary seems to be indeed veeery old (misses some 40 thousand entries from our current release, so delta files are pretty mostly useless). So you might want to ping

Re: [lingu-dev] Problem while upgrading from OOo 3.0.0 to Ooo 3.0.1with a GC extension

2009-01-29 Thread Marcin Miłkowski
) at java.lang.ClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClassInternal(Unknown Source) ... 14 more Thanks William On Wed, Jan 28, 2009 at 8:55 PM, Marcin Miłkowski wrote: Mathias Bauer pisze

Re: [lingu-dev] trouble installing grammar checker LanguageTool-0. 9.6.oxt

2009-01-29 Thread Marcin Miłkowski
Hi, the error you mention appears when you have some settings for the JRE in OpenOffice. Click the JRE used in OOo, and remove any settings you have, close OOo completely, restart and try again. Best of luck, Marcin Dnia 29 stycznia 2009 15:45 David B Teague davidbtea...@verizon.net

Re: [lingu-dev] trouble installing grammar checkerLanguageTool-0. 9.6.oxt

2009-01-29 Thread Marcin Miłkowski
: Marcin Miłkowski wrote: Hi, Marcin Hi Marcin I will fetch Language Tool 0.9.6 again just to be sure there isn't something strange about the download. This had happened with other OO.o downloads, so this is not unprecedented. I fetched the Language Tool again, with the same

Re: [lingu-dev] trouble installing grammarcheckerLanguageTool-0. 9.6.oxt

2009-01-29 Thread Marcin Miłkowski
Dnia 29 stycznia 2009 21:55 David B Teague davidbtea...@verizon.net napisał(a): Marcin Miłkowski wrote: Hi, Try to install some other Java-based extension, like Report Builder: http://extensions.services.openoffice.org/project/reportdesign If you're having problems

Re: [lingu-dev] Problem while upgrading from OOo 3.0.0 to Ooo 3.0.1 with a GC extension

2009-01-28 Thread Marcin Miłkowski
Hi William, I have just made a wiki page with instructions for users: http://languagetool.wikidot.com/removing-languagetool-0-9-5-from-openoffice-3-0-1 But this is not very user-friendly as data loss seems to be inevitable in many cases. I have no easy procedure for saving unaffected settings

Re: [lingu-dev] Problem while upgrading from OOo 3.0.0 to Ooo 3.0.1with a GC extension

2009-01-28 Thread Marcin Miłkowski
Bauer nospamfor...@gmx.de napisał(a): Hi, can you explain more deeply what the exact problem is? Is it a general problem in our extensions infrastructure? Regards, Mathias Marcin Miłkowski wrote: Hi William, I have just made a wiki page with instructions for users: http

Re: [lingu-dev] Problem while upgrading from OOo 3.0.0 to Ooo 3.0.1with a GC extension

2009-01-28 Thread Marcin Miłkowski
Mathias Bauer pisze: Joachim Lingner wrote: Thomas Lange - Sun Germany - ham02 - Hamburg schrieb: Hi, Marcin Miłkowski wrote: Hi, the problem is that the Java API has changed. So our extension that used the old API cannot find classes in new jars, and because of this fact you cannot

Re: [lingu-dev] Issue with the new grammar checking framework

2009-01-14 Thread Marcin Miłkowski
Hi, they're not hidden, look again at these grey things on the screen. The best thing to do is to log all paragraph text to a text file and compare two versions. OOo sends standard hard spaces, soft hyphens and field codes (Unicode 0x01, 0x02). These are counted as normal chars in positions.

Re: [lingu-dev] New grammar checking framework available with OOo300 m14

2009-01-09 Thread Marcin Miłkowski
Hi, let me say it again: never return a null. One more thing: my implementation uses a trivial mechanism to make sure that there is only one instance of the service (I would have to store the state on disk otherwise). If your implementation is stateless, you might have many instances, but in

Re: [lingu-dev] New grammar checking framework available with OOo300 m14

2009-01-08 Thread Marcin Miłkowski
CARLOS EDUARDO DANTAS DE MENEZES pisze: Thomas, We are trying to debug new version of CoGrOO (3.0.2) but it sounds a bit difficult. mene...@possante:~$ /opt/broffice.org3/program/soffice -writer terminate called after throwing an instance of 'com::sun::star::uno::RuntimeException' sh:

Re: [lingu-dev] Writer - Word Frequency?

2009-01-07 Thread Marcin Miłkowski
Jean-Christophe Helary pisze: On mercredi 07 janv. 09, at 08:42, Marcin Miłkowski wrote: If you have to create frequency lists very frequently, then maybe it could make some sense to create such an extension that you describe. What would be the use of the frequency list? Glossary creation

Re: [lingu-dev] New grammar checking framework available with OOo300 m14

2009-01-07 Thread Marcin Miłkowski
William Colen pisze: Thank you Thomas, I could figure it out. At that time, my grammar checker wasn't working and I was trying to find what was going wrong. Cogroo still not working with OOo 3.0.1. The OOo Writer crashes just after I press F7, and the automatic checking is not working. I'm

Re: [lingu-dev] Writer - Word Frequency?

2009-01-06 Thread Marcin Miłkowski
what a word is ... Harold Fuchs London, England Please reply *only* to dev@lingucomponent.openoffice.org On 06/01/2009 02:18, Marcin Miłkowski wrote: Save as text file, and run this awk script on it from command line (gawk -f scriptfile filename.txt): -- # Print list of word

Re: [lingu-dev] Writer - Word Frequency?

2009-01-05 Thread Marcin Miłkowski
Save as text file, and run this awk script on it from command line (gawk -f scriptfile filename.txt): -- # Print list of word frequencies { for (i = 1; i = NF; i++) freq[$0]++ } END { for (word in freq) printf %s\t%d\n, word,

Re: [lingu-dev] Official en-US spelling dictionary?

2008-11-27 Thread Marcin Miłkowski
Kevin Atkinson pisze: On Tue, 25 Nov 2008, Kevin Atkinson wrote: On Wed, 26 Nov 2008, Marcin Mi?kowski wrote: who is responsible now for maintaining the official en_US dictionary for OOo? There is an extension on the extension website but it's a bit older than the version distributed with

Re: [lingu-dev] Official en-US spelling dictionary?

2008-11-27 Thread Marcin Miłkowski
Kevin Atkinson pisze: On Thu, 27 Nov 2008, Marcin Mikowski wrote: Kevin Atkinson pisze: On Tue, 25 Nov 2008, Kevin Atkinson wrote: On Wed, 26 Nov 2008, Marcin Mi?kowski wrote: who is responsible now for maintaining the official en_US dictionary for OOo? There is an extension on the

[lingu-dev] Official en-US spelling dictionary?

2008-11-25 Thread Marcin Miłkowski
Hi, who is responsible now for maintaining the official en_US dictionary for OOo? There is an extension on the extension website but it's a bit older than the version distributed with OOo, so it cannot be considered an update (it also has a wrong version scheme). Ideally, the extension

Re: [lingu-dev] Is the Spelling and Grammar menu doing grammar checking?

2008-10-23 Thread Marcin Miłkowski
Hi, without telling us *what* problems you are experiencing it sounds as if you're telling the doctor: I'm sick, what medicine will you give me?. Please be specific :) Note that the Spelling and Grammar checking dialog box has a special check box for enabling grammar checking in the dialog

[lingu-dev] Possible 3.0 showstopper: Multiple dictionaries = crash

2008-09-30 Thread Marcin Miłkowski
Hi, I've found a bug with 3.0 RC3 - if you install a dictionary extension for a language that is already supported by a bundled dictionary pack (for example, Russian in Polish build), there is an immediate crash, even if the node names for dictionaries are different in .xcu files:

Re: [lingu-dev] Hunspell online editing software

2008-09-04 Thread Marcin Miłkowski
Jancs pisze: Hi, Olivier! Quoting Olivier R. [EMAIL PROTECTED]: I haven't tried it much, but this might be what you're looking for: http://dicollecte.free.fr I am the author of this website. The code is still unavailable, for I want to implement new features and I have to chase after

Re: [lingu-dev] hunspell suggest word by doing 2 operations

2008-07-02 Thread Marcin Miłkowski
uỹ REP ỏa oả REP ỏe oẻ REP ủy uỷ REP ọa oạ REP ọe oẹ REP ụy uỵ REP uo ườ REP uo ướ REP uo ưỡ REP uo ưở REP uo ượ REP ch tr REP d gi REP dz d REP f ph REP g gh REP gh g REP gi d REP ng ngh REP ngh ng REP s x REP tr ch REP x s Many thanks. Ivan Garcia. Marcin Miłkowski wrote: Iván García

Re: [lingu-dev] Slow creating of Hunspell instance

2008-07-02 Thread Marcin Miłkowski
Hi, I don't know the exact answer, but: 1) it might take some time to read a long and complicated dictionary (disk access time + processing, for example Hebrew needs quite a long time) 2) you might instantiate the class at the start of the program, and then keep the instance for later use.

Re: [lingu-dev] hunspell suggest word by doing 2 operations

2008-06-30 Thread Marcin Miłkowski
Iván García pisze: Currently in our Vietnamese hunspell dictionary (for firefox and Openoffice), if we misspell đường as đừong , we get three suggestions: đừ ong (adding space, 1 operation) đong (removing ừ , 1 operation) đừng (removing o , 1 operation) actually we'd like the system to

Re: [lingu-dev] Re: Packaging extension dictionaries + fingerprint files?

2008-06-25 Thread Marcin Miłkowski
Michel Weimerskirch pisze: Sorry for sending a second mail, but I forgot to add: same question for the autocorrection data. You can pack autocorrection data with the extension but I wouldn't do this because the user cannot really edit the autocorrect list - on every update, the list will be

Re: [lingu-dev] Grammar Check API questions

2008-05-25 Thread Marcin Miłkowski
Hi Thomas, Thomas Lange - Sun Germany - ham02 - Hamburg pisze: For simple Grammar checker implementation that does not use a new thread to get the result you can look up the file grammarchecker.cxx in linguistic. How do other parts of OOo code know they should add themselves as listeners? I

Re: [lingu-dev] Hunspell morphological analysis and grammar checker

2008-05-25 Thread Marcin Miłkowski
Michel Weimerskirch pisze: Hi I have been playing around with the morphological analysis features of hunspell(*). Has anyone investigated if and how the PoS data associated to a hunspell wordlist could be used for a grammar checker? Yes and no ;) I mean I've built a Polish tagger using a

Re: [lingu-dev] Hunspell morphological analysis and grammar checker

2008-05-25 Thread Marcin Miłkowski
Jancs pisze: Quoting Marcin Miłkowski [EMAIL PROTECTED]: I'm still planning to start a major rewrite of affix flag / tagging rules as the Polish hunspell source has been significantly cleared up (it contained many duplicates in terms of flags creating the same PoS tag and the same affix

Re: [lingu-dev] Hunspell morphological analysis and grammar checker

2008-05-25 Thread Marcin Miłkowski
Michel Weimerskirch pisze: Thank you Marcin and Janis for your comments. I have been rebuilding the wordlist for Luxembourgish from scratch for the last few months. It will be released in a few weeks. Most of the words are arranged in separate lists like adjectives, nouns, etc. and the affix

Re: [lingu-dev] Grammar Check API questions

2008-05-21 Thread Marcin Miłkowski
Hi Mathias, Mathias Bauer wrote: The checker should implement the XGrammarChecker interface, and define its abilities this way. OK. The checking process would be started with doGrammarChecking() method. OK. But here goes what I don't understand. Suppose I found some errors with my code, and

[lingu-dev] Grammar Check API questions

2008-05-20 Thread Marcin Miłkowski
Hi all, I'm trying to understand the new API from the point of view of the grammar checker developer (unfortunately, the docs are written from the opposite point of view). So far I got this (I'm using Java, so Java API names follow): The checker should implement the XGrammarChecker

Re: [lingu-dev] Help needed - bulk extraction of words

2008-02-07 Thread Marcin Miłkowski
Hi, you'd need as well to convert these document to pure text in order to process them; you can try to spawn OOo for conversion in a batch mode but the easier option is to use unzip in a script, and take content.xml only from the files. Then process the files using awk (define the field

Re: [lingu-dev] java spell checking tool for hunspell

2008-01-08 Thread Marcin Miłkowski
ge pisze: My overall impression is, that TM is a toothless helper, it helps in fact very little, and forces usage of additional tools. For an individual translator, it can be of little help if the text doesn't contain many repetitions. For a group, it helps to sustain consistency of

Re: [lingu-dev] Grammar checker for Spanish

2007-11-19 Thread Marcin Miłkowski
Mathias Bauer pisze: I call it a Proof reading tool as the Language Tool at the same time does less than some well known Grammar Checkers but OTOH does also more. I like this term. This way it's more clear to the users what they can expect (and they can expect it from most so-called grammar

Re: [lingu-dev] Grammar checker for Spanish

2007-11-17 Thread Marcin Miłkowski
Lars Aronsson pisze: How can I create a minimal grammar? Is there an independent command-line program (similar to the hunspell program for spell checking) that can be used for testing the grammar? There is no built-in grammar checker in OOo. You can reuse one of the grammar-checking

Re: [lingu-dev] Grammar checker for Spanish

2007-11-17 Thread Marcin Miłkowski
ge pisze: If the rules check position of word types in the sentence (e.g. verb, noun, adj, adv, ...), then besides the rules also a dicionary is necessary containing the word type for each word. To set up such a dictionary is not trivial. True, but Lars was asking only about starting the

Re: [lingu-dev] Grammar checker for Spanish

2007-11-11 Thread Marcin Miłkowski
Hi, there is some preliminary support for Spanish in LanguageTool (www.languagetool.org). You can contact me for any details. Regards, Marcin Kepa R. pisze: Hallo, I would like to help in the development of a free grammar checker for Spanish. Are there more people working on that? Best

Re: [lingu-dev] AutoCorrect as Extension?

2007-10-10 Thread Marcin Miłkowski
Mathias Bauer pisze: Marcin Miłkowski wrote: Hi, is it possible to have AutoCorrect items installable as an extension for a particular language? If AutoText is installable, then maybe AutoCorrect as well? I'm not sure about this. You can check this by yourself by adding additional folders

Re: [lingu-dev] AutoCorrect as Extension?

2007-10-10 Thread Marcin Miłkowski
Mathias Bauer pisze: I tried and it works. The entries are changeable but this is a feature for me (after all, you might want to customize the autocorrect). I will post a skeleton to the wiki, and some files for Polish users :) I think changeable entries in an extension are a hairy thing.

[lingu-dev] AutoCorrect as Extension?

2007-10-09 Thread Marcin Miłkowski
Hi, is it possible to have AutoCorrect items installable as an extension for a particular language? If AutoText is installable, then maybe AutoCorrect as well? Thanks for any tip, Marcin - To unsubscribe, e-mail: [EMAIL

Re: [lingu-dev] Dictionaries in lang packages?

2007-10-08 Thread Marcin Miłkowski
Hristo Hristov pisze: On 8.10.2007, Mathias Bauer wrote: Hristo Simenov Hristov wrote: Hi, A lot of people complain that when they install new version of OOo they have to install our dictionaries again. Is it possible to be done something? For example in language packages (isntall) to add

Re: [lingu-dev] Dictionaries in lang packages?

2007-10-06 Thread Marcin Miłkowski
Hristo Hristov pisze: On 4.10.2007, Simon Brouwer wrote: The problem with many available dictionaries is that their license is not compatible with OOo, e.g. GPL. That means you can use them with OOo, by separately downloading them, but they cannot be made a part of it. Well, when I install

Re: [lingu-dev] Dictionaries in lang packages?

2007-10-04 Thread Marcin Miłkowski
Simon Brouwer pisze: I did *not* say that it's not a problem to include a dictionary if its license is compatible. If most dictionaries would be LGPL or otherwise compatibly licensed, it would still be undesirable to bundle them all. That would significantly bloat the install package, while

Re: [lingu-dev] Hunspell in otehr apps then OOo

2007-10-04 Thread Marcin Miłkowski
Hi Ruud, r.j.baars pisze: L.S. Is one of the specialists out here able to point out a location where I can find information on integrating Hunspell in C or java applications? Use our mailing list archive to find my posts on Java ports of Hunspell/myspell. There are at least three of them.

Re: [lingu-dev] issue 79051

2007-10-02 Thread Marcin Miłkowski
Mathias Bauer pisze: Nicolas Mailhot schrieb: That's completely true on the 'no building or packaging' side. On the 'just give access' side, distributors prefer regular versioned archive dumps on an ftp/http: OK, I will put this information into the ongoing discussion about possible changes

Re: [lingu-dev] Spell check dictionary update

2007-10-01 Thread Marcin Miłkowski
Nicolas Mailhot pisze: Le Dim 30 septembre 2007 20:34, Harri Pitkänen a écrit : - We do not have an OOo macro for sending suggestions, but I think it is a great idea. IMHO this is a terrible idea. A web form can be integrated in the translation/i18n/l10n web hubs big distributions like

Re: [lingu-dev] Finnish dictionaries

2007-10-01 Thread Marcin Miłkowski
Harri Pitkänen pisze: On Monday 01 October 2007, Mathias Bauer wrote: So are you planning to ask for replacing hunspell as OOo spell checker by your new spell checker? No, our plan is to continue doing what we already do: ship openoffice.org-voikko as an extension that provides spell checker

Re: [lingu-dev] issue 79051

2007-10-01 Thread Marcin Miłkowski
Nicolas Mailhot pisze: If SUN and/or OO.o wanted to move hunspell dicts to a neutral ground, in the hope of getting people not interested in OO.o to contribute, I'd advise targeting freedesktop.org. But for this move to succeed the contribution should include manpower to make the project

Re: [lingu-dev] issue 79051

2007-09-30 Thread Marcin Miłkowski
Nicolas Mailhot pisze: [for people interested in flames] OK, let's stop the flame, right? Let me snip therefore most of our sophisticated insults ;) A. an authoritative download source (ftp or http directory) There are OOo mirrors with most stuff that is put on the wiki. And

Re: [lingu-dev] Spell check dictionary update

2007-09-30 Thread Marcin Miłkowski
Hi Robert, Robert Ludvik pisze: In just a few words: people can send words, that are not yet in spell check dictionary trough a web form or with a help of a macro, which is for now only available for OOo but could be ported to MSO, KOffice(?). Relevant people (linguists) would then review sent

Re: [lingu-dev] issue 79051

2007-09-29 Thread Marcin Miłkowski
Simon Brouwer pisze: Sure, if you have this menu option. But it appears that the OOo flavour bundled in some Linux distributions (at least OpenSuSE) doesn't include the DicOOo wizard and therefore also not this menu option. I've been reported that Fedora also hasn't bundled DicOOo into their

Re: [lingu-dev] issue 79051

2007-09-29 Thread Marcin Miłkowski
Nicolas Mailhot pisze: So instead of wasting energy pretending the linguistic project with its few spell-checkers knows better the distribution job that distros which have been at it for years and update systems from the kernel to the UI theme, if OO.o could focus on getting its material in a

Re: [lingu-dev] issue 79051

2007-09-29 Thread Marcin Miłkowski
Nicolas Mailhot pisze: Le samedi 29 septembre 2007 à 19:07 +0200, Marcin Miłkowski a écrit : Nicolas Mailhot pisze: So instead of wasting energy pretending the linguistic project with its few spell-checkers knows better the distribution job that distros which have been at it for years

Re: [lingu-dev] Spellchecker : on letter error not detected

2007-06-19 Thread Marcin Miłkowski
Laurent Godard pisze: Hi Frank any opinion ? where to look in the sources ? sw/source/core/txtnode/txtedt.cxx, search for 'rWord.Len() 1' thanks a lot any planned impact ? what about punctuations as noticed by thomas Well, this can slow down the spell-checking process a little. I'm not

Re: [lingu-dev] Slow dictionary load

2007-04-30 Thread Marcin Miłkowski
Hi Alan, I don't think it's the reason. Polish dictionary file is about 4 MB and it loads fast (however the affix file is about 200K). Check it yourself. However, it's not UTF-8 - it's ISO-8859-2. Maybe UTF-8 makes it slower? Regards, Marcin Alan Yaniger napisał(a): Hi Daniel, Thanks for

Re: [lingu-dev] Thesaurus Server

2007-03-31 Thread Marcin Miłkowski
Hi Graham, Warned? about what? If the Wordnet list is Opensource then what is the issue? I understood you want to start from _scratch_. Wordnet was sponsored by a 20-million USD grant, and done by a team of really qualified linguists. And it is one of the biggest achievements in computer

Re: [lingu-dev] Thesaurus Server

2007-03-16 Thread Marcin Miłkowski
Hi Graham, Graham Lauder napisał(a): 2. Contact the Wordnet people and ask them if there's a way to contribute Commonwealth words in a way they are marked in the data, i.e. they can be filtered out for those who don't want them. 2 is clearly the better solution. Thanks for your offer to host

Re: [lingu-dev] Thesaurus Server

2007-03-14 Thread Marcin Miłkowski
Hi all, it seems to be a big misunderstanding here: My Thesaurus isn't a server software and you really cannot setup it on any web server. What you can do, however, to edit thesauruses, is to install Open Thesaurus (see www.openthesaurus.de) which is used successfully to prepare OOo

[lingu-dev] Italian translation for grammar checker interface needed

2007-03-12 Thread Marcin Miłkowski
messages (or something) to Italian. So please contact me in private if you want to help :) Hopefully, this would attract more attention from people that need some more sophisticated spelling or grammar rules for Italian. Regards, Marcin Miłkowski

Re: [lingu-dev] Hunspell - morphological analysis

2007-03-06 Thread Marcin Miłkowski
[EMAIL PROTECTED] napisał(a): You speak about your solution. What is it? Is it a morphological analysis tool or a grammar checker? Dictionary-based POS-tagger for LanguageTool, using finite-state automata format for storing data (one of the most efficient dictionary formats, in terms of

Re: [lingu-dev] Hunspell - morphological analysis

2007-03-05 Thread Marcin Miłkowski
Hi William, in its current state, hunspell has some limitations which make this solution imperfect. First, in some languages only all flags of the base form of the word determine the part of speech information (two genders can share the same affixes, for example). Hunspell is unable to

Re: [lingu-dev] Hunspell - morphological analysis

2007-03-05 Thread Marcin Miłkowski
Hi Eleonora, For English position in sentence analysers are the tool of choice (because lots of verbs are substantives, e.g walk play, etc...) For POS-tagging this is the tool of choice but not so for grammar checking. We had a statistical sentence-level POS tagger in LanguageTool but it

Re: [lingu-dev] Invitation: Re-working of path settings for the linguistic

2007-02-11 Thread Marcin Miłkowski
Mathias Bauer napisał(a): Laurent Godard schrieb: One goal would be to use a multi-path for user/wordbook and share/wordbook. Another would be to resolve the conflict of user-dictionaries and downloadable dictionaries in user/wordbook, especially since both of them happen to use the same file

Re: [lingu-dev] licence and copyright issues for hyphenation patterns

2007-02-07 Thread Marcin Miłkowski
Hi Matthias, thanks for pointing this out. Polish hyphenation patterns have a README, and I've just added the missing pointers to the original LeX/MeX patterns: http://pl.openoffice.org/pliki/README_hyph_pl_PL.txt As a leader of Polish NLC project, I get asked personally about three times a

[lingu-dev] French autocorrect dictionaries

2007-01-07 Thread Marcin Miłkowski
Hi, I found out there are request in French community to have a hard space before the exclamation mark and the question mark. Now, the French autocorrect rules do not have it, but this is extremely easy to add. So my question is: who maintains the autocorrect rules so that I could send the

[lingu-dev] Re: [l10n-dev] French autocorrect dictionaries

2007-01-07 Thread Marcin Miłkowski
Hi Sophie, Thanks for this. Well, actually, I was too optimistic. My patch inserts a hard space after a space, so it's not really a solution :((( But there is some hope. First of all, I will encode this rule into LanguageTool (we already have some basic support for French). And the vlnka

[lingu-dev] Wikipedia revision history for grammar/spelling changes, was: Re: [lingu-dev] Spell checking metrics

2007-01-06 Thread Marcin Miłkowski
Hi all, some of you could find my recent experiments interesting. I posted them here: http://morfologik.blogspot.com/2007/01/wikipedia-history-diff-as-revision.html In short, it seems that Lars' idea was brilliant, and it is possible to filter out the edit wars using simple metrics. Prepare

Re: [lingu-dev] Status update season!

2006-12-23 Thread Marcin Miłkowski
Hi Eleonora Does Polish support compund words? Yes, there are some. If yes, how intensively? (English also supports them, e.g. rainbow, raincoat, however, it is very reserved in their usage). The only examples of compound words are words like autoformat autoformatowanie (autoformat),

Re: [lingu-dev] Spell checking metrics; was:[native-lang] Status update season!

2006-12-22 Thread Marcin Miłkowski
Jonathan wrote: Lars wrote: One idea for finding stats on errors is to compare changes made to Wikipedia articles. The complete text revision history is That might make a good corpus. Would it be possible to write a script that picks up just the spelling/grammar changes? If not, you'll be

Re: [lingu-dev] Standalone Hunspell

2006-11-13 Thread Marcin Miłkowski
Simon Brouwer napisał(a): Hi, Is it possible to run Hunspell in a standalone mode, feeding it a list of words and obtaining a file that contains the spell check results and suggestions for these words? Of course. See http://hunspell.sourceforge.net Regards, Marcin

Re: [lingu-dev] hunspell in windows?

2006-09-24 Thread Marcin Miłkowski
Hi Robert, Thanks for your reply. Yes, I've looked at that page, but the instructions at Download and test your affix file and dictionary using the Hunspell stand-alone realease doens't say much, and when I downloaded the stand-alone release I couldn't see how to get it to work in Windows. I

Re: [lingu-dev] Perl support for the hunspell library

2006-07-10 Thread Marcin Miłkowski
Dear Dmitri, have a look on hunspell website, try to find jmorph (Java version of hunmorph which uses hunspell dictionaries, I guess). It seems to me there are some programs that interface myspell dictionaries in Java. This could be adapted to OmegaT more easily but you'll need to search

Re: [lingu-dev] [SoC] Grammar checker API

2006-06-05 Thread Marcin Miłkowski
Bruno Sant'Anna wrote: So do you think that both methods can be put In API (provide sentence and paragraphs) ? The grammar checker itself choses the best way to him? Do you agree that enabling paragraph method we have the language problem right? This is a thing I worry about... Well, I

Re: [lingu-dev] [SoC] Grammar checker API

2006-06-01 Thread Marcin Miłkowski
Bruno Sant'Anna wrote: I know splitting the paragraph into sentences is not trivial but I sincerely think that this way is better than sending the full paragraph when we are dealing with more than one language. Why not using the language attribute for decided which grammar checker should

Re: [lingu-dev] lost postings[1/5]: [SoC] Grammar checker API

2006-06-01 Thread Marcin Miłkowski
Matthew Strawbridge wrote): I once noticed when toying with the MS grammar checker that if there are too many errors in a text (e.g. because the language attribute is inappropriate) it displays a message like too much errors encountered. Maybe it is foreign text... and stops grammar checking

Re: [lingu-dev] [SoC] Grammar checker API

2006-06-01 Thread Marcin Miłkowski
Daniel Naber wrote: 9-) If the grammar checker find an error the complete sentence is marked as it has an error. (Not just a piece of sentence has errors). Does that mean the whole sentence would be underlined? I guess not, as that wouldn't make sense. +1 I also made a comment directly on

Re: [lingu-dev] [SoC] Grammar checker API #UPDATED#

2006-05-26 Thread Marcin Miłkowski
Jonathon Blake wrote: Bruno wrote: we can create a unified dictionary but this implies modification in spell checker. Au Gramadoir uses a different type of spell checker. Language Tool uses another type of spell checker. Language Tool in its current version (0.8.2) does not use any spell

Re: [lingu-dev] Grammar checker API #UPDATED#

2006-05-26 Thread Marcin Miłkowski
Hi Daniel and Simon, Daniel Naber wrote: On Freitag 26 Mai 2006 18:57, Simon Brouwer wrote: It would be great if the relevant data for the grammar checker would be easy to generate from a list of such word groups. It's easy to write such rules in LanguageTool. For example, this of cause

Re: [lingu-dev] [SoC] Grammar checker API

2006-05-23 Thread Marcin Miłkowski
Checker how many registers and which registers it actually supports). Now, should the register list be defined generally for all languages or set by a grammar checker for a particular language? Hard to say which is best. Best regards, Marcin Miłkowski (developing a Polish checker

Re: [lingu-dev] Please, help (OO macro for devel needs)

2006-04-21 Thread Marcin Miłkowski
Jancs napisał(a): Hi! Can i ask someone to help me out with the design of the OO macro having the following functionality: it checks opened document (word by word, f.e.) against the dic of document language and writes words, which are not found in the dic, to the separate file. If the file

[lingu-dev] stemmer for thesaurus

2006-03-09 Thread Marcin Miłkowski
can simply implement a best-bet algorithm to show only one form without distracting the user too much. Best regards, Marcin Miłkowski - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: [lingu-dev] Hunspell OOo version target

2005-08-05 Thread Marcin Miłkowski
Laurent Godard wrote: As huspell is designed as an addon and there is a need of an intensive and systematic verification of all existing dictionaries, i would propose this: - deliver it as an addon, officially supported by the lingucomponent project or the comming scripting one. One thing

Re: [lingu-dev] Hunspell OOo version target

2005-08-05 Thread Marcin Miłkowski
Laurent Godard napisał(a): Obviously, Hunspell has to become the default engine as it replies to a need is Hunspell actually ready to be integrated ? Agree, we have to do a testing campaign; the first step of offering it as an addon would be OK: end-users of non-Unicode languages won't

Re: [lingu-dev] Thesaurus for Nepali language in OOo

2005-07-27 Thread Marcin Miłkowski
Hi Subir, check out hunspell.sourceforge.net. Hunspell supports multibyte characters so there should be no problem, and it probably will replace myspell. Contact Laszlo Nemeth at [EMAIL PROTECTED] (the developer of hunspell) for more information. He is supposed to replace Kevin as well.

Re: [lingu-dev] Re: [dev] Contributions

2005-07-27 Thread Marcin Miłkowski
Daniel Naber wrote: The thesaurus would benefit from code that can find the base form for any word. E.g. walked - walk, children - child. This could be plugged into the existing thesaurus code easily, it's basically just one method like getBaseform(String). Of course it would need to support

Re: [lingu-dev] About grammar checker

2005-07-26 Thread Marcin Miłkowski
Laurent Godard wrote: Also, would it be possible to provide style information to the grammar checker (so it gets XML instead of plain text)? Users might want to set up rules that check, for example, that certain keywords are always bold or that latin terms appear in italics. This might also be