Testing ruleEditor2

2014-03-23 Thread Kumara Bhikkhu
Here's a test I ran:

wrong sentence: I'll created it.
corrected sentence: I'll create it.
corrected sentence: I've created it.

Error Pattern
Note: LanguageTool can already detect the following error(s) in your 
first wrong example sentence:
  The verb 'll' requires the base form of the verb: create.
I'll created it.

So, the second corrected sentence is ignored.

Btw, I've trouble understanding the content of the "Token #1" box.

kb


--
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Finding rules

2014-03-23 Thread Dave Pawson
On 23 March 2014 17:31, Marcin Miłkowski  wrote:
> W dniu 2014-03-23 13:13, Dave Pawson pisze:
>>
>> So if I specify
>> java -jar ${langtools}/languagetool-commandline.jar --language EN-GB
>> --disable $disRules $*
>> there are two grammar files in use?
>>   IMHO it would help the user (or at least annoy him/her less) if I was told
>> which file / rule is being used.
>
> Well, you get 8 rules or something more. In general, it doesn't make
> much difference if you specify a country variant; this is, I think, the
> only combination where it does matter (we don't have too many special
> country-variant rules). In verbose mode, we already display lots of
> info, but we can add this.

Ah! Wait till I try verbose mode, I'd not tried that.
I think this information will only be needed 'rarely', so a bit of
work to find it
will not hurt?
  My use case is:
 Note a 'mistake'. Find the rule. Add it to the 'ignore' list from
the command line


>
>>
>> Yes, they are used to generate primary, secondary and tertiary terms
>> in the index.
>>
>> I have asked on the docbook list, I'll provide a stylesheet for docbook
>> expanding includes, removing 'extras' such as indexterms.
>
> Right. Remember, however, that integrating corrections will not be
> trivial then. What I mean is that LT displays the position of the
> mistake (also in its XML output) which can be used to highlight the
> error. If you remove any content with a stylesheet, then the initial
> position may be skewed, and highlights will show in random places
> because LT won't see the markup. This is why a stylesheet is not really
> the way to write an AnnotatedText parser for us. We rather need to parse
> docbook with some special Java code, which might be simple anyway.

Agreed. But as an example I have a 500Kword document, one main file
and 40 xincluded files. So line numbers in the original are 'wrong' in most
error reports?
For syntax errors I normally note the text then grep in the files to
find the original source  of the error?




>
>>
>>>

 
 Unpaired_brackets error

 In my XML I'm using "'"  single quote as both apostrophe
 and single quote (rightly or wrongly).
 --disable EN_UNPAIRED_BRACKETS
 as a command line option would (presumably) disable match
 checking for a number of characters?
>>>
>>> You could but LT should handle apostrophes and single quotes without any
>>> problems. If it doesn't, please file an issue on github for me

Will do If I can not resolve it


>>>
>>> https://github.com/languagetool-org/languagetool/issues?state=open
>>>
>>> But you can paste the example here, if it's not anything confidential,
>>> of course.
>>
>>
>>
>> 185.) Line 489, column 15, Rule ID: EN_UNPAIRED_BRACKETS
>> Message: Unpaired bracket or similar symbol
>> ... key for the front door. > xlink:href="http://www.randrsecurity.com/";>R and R securi...
>>
>> Clearly there isn't an unpaired " character.  Not sure what else it
>> might be reporting?
>> Not very clear though.
>
> Right. This is just because the tag is split with an end-of-line marker.
> You're apparently using -b parameter which breaks at a single
> end-of-line marker, but this is wrong for your files.

?? I don't think I am using -b (I am not on my main machine, I will check).
Does the rule 'reset' at end of line? That sounds wrong for plain text?

>
>>
>>>
 Is it possible to be more selective?
>>>
>>> No. We don't have that option.
>>
>> In which case could a rule be repeated with less content in the set?
>
> Not really, as this is a Java rule.
>
> Anyway, the false positive is here just because of the end-of-line markers.

OK, I don't understand the end of line - I'll test it out on a small
file to find
out what is happening.

Regards Dave P





-- 
Dave Pawson
XSLT XSL-FO FAQ.
Docbook FAQ.
http://www.dpawson.co.uk

--
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Finding rules

2014-03-23 Thread Marcin Miłkowski
W dniu 2014-03-23 13:13, Dave Pawson pisze:
>
> So if I specify
> java -jar ${langtools}/languagetool-commandline.jar --language EN-GB
> --disable $disRules $*
> there are two grammar files in use?
>   IMHO it would help the user (or at least annoy him/her less) if I was told
> which file / rule is being used.

Well, you get 8 rules or something more. In general, it doesn't make 
much difference if you specify a country variant; this is, I think, the 
only combination where it does matter (we don't have too many special 
country-variant rules). In verbose mode, we already display lots of 
info, but we can add this.

>
> Yes, they are used to generate primary, secondary and tertiary terms
> in the index.
>
> I have asked on the docbook list, I'll provide a stylesheet for docbook
> expanding includes, removing 'extras' such as indexterms.

Right. Remember, however, that integrating corrections will not be 
trivial then. What I mean is that LT displays the position of the 
mistake (also in its XML output) which can be used to highlight the 
error. If you remove any content with a stylesheet, then the initial 
position may be skewed, and highlights will show in random places 
because LT won't see the markup. This is why a stylesheet is not really 
the way to write an AnnotatedText parser for us. We rather need to parse 
docbook with some special Java code, which might be simple anyway.

>
>>
>>>
>>> 
>>> Unpaired_brackets error
>>>
>>> In my XML I'm using "'"  single quote as both apostrophe
>>> and single quote (rightly or wrongly).
>>> --disable EN_UNPAIRED_BRACKETS
>>> as a command line option would (presumably) disable match
>>> checking for a number of characters?
>>
>> You could but LT should handle apostrophes and single quotes without any
>> problems. If it doesn't, please file an issue on github for me:
>>
>> https://github.com/languagetool-org/languagetool/issues?state=open
>>
>> But you can paste the example here, if it's not anything confidential,
>> of course.
>
>
>
> 185.) Line 489, column 15, Rule ID: EN_UNPAIRED_BRACKETS
> Message: Unpaired bracket or similar symbol
> ... key for the front door.  xlink:href="http://www.randrsecurity.com/";>R and R securi...
>
> Clearly there isn't an unpaired " character.  Not sure what else it
> might be reporting?
> Not very clear though.

Right. This is just because the tag is split with an end-of-line marker. 
You're apparently using -b parameter which breaks at a single 
end-of-line marker, but this is wrong for your files.

>
>>
>>> Is it possible to be more selective?
>>
>> No. We don't have that option.
>
> In which case could a rule be repeated with less content in the set?

Not really, as this is a Java rule.

Anyway, the false positive is here just because of the end-of-line markers.

Regards,
Marcin

--
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Dictionary?

2014-03-23 Thread Dave Pawson
OK, thanks, and yes I did mean a 'personal' dictionary for spelling
(wrt spelling,
e.g. my wife is 'Su' which I would add as a variant of Sue.

regards

On 23 March 2014 12:10, Marcin Miłkowski  wrote:
> W dniu 2014-03-23 13:00, Dave Pawson pisze:
>> On 23 March 2014 11:02, Marcin Miłkowski  wrote:
>>> W dniu 2014-03-23 11:32, Dave Pawson pisze:
 How to add my own 'personal' dictionary please?
 I can't find any documentation at
 http://wiki.languagetool.org/development-overview
>>> Because there is no personal dictionary implemented.
>> Hopefully on the longer term todo list?
>
> Well, you can see our longer TODO here:
>
> http://wiki.languagetool.org/missing-features
>
> But I am not sure if you're talking of the personal dictionary for
> spelling or about personal grammar rules here. These are different things.
>
>>
>>
 Longer term issue.
 I add to en-GB grammar.xml file.
 Install / update the code... and
 lose my additions?
  Should 'myGrammar.xml' be xincluded perhaps?
>>> Yep, this is long on our TODO list. But if your rule is general enough
>>> to be interesting to others, you might also want to share it with us and
>>> then you won't lose anything with the update. ;)
>> Agreed, but (e.g. I hate smart quotes) some will be 'mine', not
>> fit for general use?
>
> Yeah, we should have a mechanism for joining rules. This is not very
> complex to implement but we focused on other features for the upcoming
> release. Maybe in 2.6.
>
> Regards,
> Marcin
>
>>
>> regards
>>
>>
>>
>>
>
>
> --
> Learn Graph Databases - Download FREE O'Reilly Book
> "Graph Databases" is the definitive new guide to graph databases and their
> applications. Written by three acclaimed leaders in the field,
> this first edition is now available. Download your free book today!
> http://p.sf.net/sfu/13534_NeoTech
> ___
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel



-- 
Dave Pawson
XSLT XSL-FO FAQ.
Docbook FAQ.
http://www.dpawson.co.uk

--
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Finding rules

2014-03-23 Thread Dave Pawson
On 23 March 2014 11:23, Marcin Miłkowski  wrote:
> W dniu 2014-03-23 11:27, Dave Pawson pisze:
>> I'm being shown an 'error'
>> 536.) Line 565, column 1, Rule ID: WHITESPACE_RULE
>
> This is a Java rule, so it's not in XML:
>
> http://community.languagetool.org/rule/show/WHITESPACE_RULE?lang=en

Ah! Special  cases.

>
>> I'm using English.
>> How to find the grammar.xml file in use?
>> It seems there could be a number?
>> .../rules/en
>> .../rules/en/en-GB
>
> en-GB rules are applied (in addition to other English rules) only if you
> select English-UK.

So if I specify
java -jar ${langtools}/languagetool-commandline.jar --language EN-GB
--disable $disRules $*
there are two grammar files in use?
 IMHO it would help the user (or at least annoy him/her less) if I was told
which file / rule is being used.


>
>>
>> ===
>>
>> re XML spell checking?
>> the markup is fooling the parser?
>
> Heh, calling this a parser is a bit too much. It's a dirty regexp.
>
>> olympics olympic
>> is being reported as spelling error?
>> And (guessing)...>olympics is being reported as an error?
>
> Nope. The word "olympics" is at the beginning of line so it's considered
> to be a spelling mistake, at least for me here:
>
> 1.) Line 1, column 1, Rule ID: UPPERCASE_SENTENCE_START
> Message: This sentence does not start with an uppercase letter
> Suggestion: Olympics
> olympics olympic
> 

OK. My fault. Thanks.



>
>> How to strip markup prior to tokenise?
>
> It *is* stripped. You can use -v to see the verbose mode.
>
>>XSLT makes that easy but!
>>
>> Big Ben Big Ben
>>
>> Here Big Ben is used twice. Once for the indexer, once for the primary
>> content of the text. I.e. text stripping needs to be
>> vocabulary aware.
>
> Well, maybe the future docbook parser should ignore index terms as these
> are not correct English words but something like keys?

Yes, they are used to generate primary, secondary and tertiary terms
in the index.

I have asked on the docbook list, I'll provide a stylesheet for docbook
expanding includes, removing 'extras' such as indexterms.


>
>>
>> 
>> Unpaired_brackets error
>>
>> In my XML I'm using "'"  single quote as both apostrophe
>> and single quote (rightly or wrongly).
>> --disable EN_UNPAIRED_BRACKETS
>> as a command line option would (presumably) disable match
>> checking for a number of characters?
>
> You could but LT should handle apostrophes and single quotes without any
> problems. If it doesn't, please file an issue on github for me:
>
> https://github.com/languagetool-org/languagetool/issues?state=open
>
> But you can paste the example here, if it's not anything confidential,
> of course.



185.) Line 489, column 15, Rule ID: EN_UNPAIRED_BRACKETS
Message: Unpaired bracket or similar symbol
... key for the front door. http://www.randrsecurity.com/";>R and R securi...

Clearly there isn't an unpaired " character.  Not sure what else it
might be reporting?
Not very clear though.

>
>>Is it possible to be more selective?
>
> No. We don't have that option.

In which case could a rule be repeated with less content in the set?

regards




-- 
Dave Pawson
XSLT XSL-FO FAQ.
Docbook FAQ.
http://www.dpawson.co.uk

--
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Dictionary?

2014-03-23 Thread Marcin Miłkowski
W dniu 2014-03-23 13:00, Dave Pawson pisze:
> On 23 March 2014 11:02, Marcin Miłkowski  wrote:
>> W dniu 2014-03-23 11:32, Dave Pawson pisze:
>>> How to add my own 'personal' dictionary please?
>>> I can't find any documentation at
>>> http://wiki.languagetool.org/development-overview
>> Because there is no personal dictionary implemented.
> Hopefully on the longer term todo list?

Well, you can see our longer TODO here:

http://wiki.languagetool.org/missing-features

But I am not sure if you're talking of the personal dictionary for 
spelling or about personal grammar rules here. These are different things.

>
>
>>> Longer term issue.
>>> I add to en-GB grammar.xml file.
>>> Install / update the code... and
>>> lose my additions?
>>>  Should 'myGrammar.xml' be xincluded perhaps?
>> Yep, this is long on our TODO list. But if your rule is general enough
>> to be interesting to others, you might also want to share it with us and
>> then you won't lose anything with the update. ;)
> Agreed, but (e.g. I hate smart quotes) some will be 'mine', not
> fit for general use?

Yeah, we should have a mechanism for joining rules. This is not very 
complex to implement but we focused on other features for the upcoming 
release. Maybe in 2.6.

Regards,
Marcin

>
> regards
>
>
>
>


--
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Dictionary?

2014-03-23 Thread Dave Pawson
On 23 March 2014 11:02, Marcin Miłkowski  wrote:
> W dniu 2014-03-23 11:32, Dave Pawson pisze:
>> How to add my own 'personal' dictionary please?
>> I can't find any documentation at
>> http://wiki.languagetool.org/development-overview
>
> Because there is no personal dictionary implemented.

Hopefully on the longer term todo list?


>
>>
>> Longer term issue.
>> I add to en-GB grammar.xml file.
>> Install / update the code... and
>> lose my additions?
>> Should 'myGrammar.xml' be xincluded perhaps?
>
> Yep, this is long on our TODO list. But if your rule is general enough
> to be interesting to others, you might also want to share it with us and
> then you won't lose anything with the update. ;)

Agreed, but (e.g. I hate smart quotes) some will be 'mine', not
fit for general use?

regards




-- 
Dave Pawson
XSLT XSL-FO FAQ.
Docbook FAQ.
http://www.dpawson.co.uk

--
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Finding rules

2014-03-23 Thread Marcin Miłkowski
W dniu 2014-03-23 11:27, Dave Pawson pisze:
> I'm being shown an 'error'
> 536.) Line 565, column 1, Rule ID: WHITESPACE_RULE

This is a Java rule, so it's not in XML:

http://community.languagetool.org/rule/show/WHITESPACE_RULE?lang=en

> I'm using English.
> How to find the grammar.xml file in use?
> It seems there could be a number?
> .../rules/en
> .../rules/en/en-GB

en-GB rules are applied (in addition to other English rules) only if you 
select English-UK.

>
> ===
>
> re XML spell checking?
> the markup is fooling the parser?

Heh, calling this a parser is a bit too much. It's a dirty regexp.

> olympics olympic
> is being reported as spelling error?
> And (guessing)...>olympics is being reported as an error?

Nope. The word "olympics" is at the beginning of line so it's considered 
to be a spelling mistake, at least for me here:

1.) Line 1, column 1, Rule ID: UPPERCASE_SENTENCE_START
Message: This sentence does not start with an uppercase letter
Suggestion: Olympics
olympics olympic


> How to strip markup prior to tokenise?

It *is* stripped. You can use -v to see the verbose mode.

>XSLT makes that easy but!
>
> Big Ben Big Ben
>
> Here Big Ben is used twice. Once for the indexer, once for the primary
> content of the text. I.e. text stripping needs to be
> vocabulary aware.

Well, maybe the future docbook parser should ignore index terms as these 
are not correct English words but something like keys?

>
> 
> Unpaired_brackets error
>
> In my XML I'm using "'"  single quote as both apostrophe
> and single quote (rightly or wrongly).
> --disable EN_UNPAIRED_BRACKETS
> as a command line option would (presumably) disable match
> checking for a number of characters?

You could but LT should handle apostrophes and single quotes without any 
problems. If it doesn't, please file an issue on github for me:

https://github.com/languagetool-org/languagetool/issues?state=open

But you can paste the example here, if it's not anything confidential, 
of course.

>Is it possible to be more selective?

No. We don't have that option.

Regards,
Marcin

>
> regards
>
>
>
>
>


--
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Dictionary?

2014-03-23 Thread Marcin Miłkowski
W dniu 2014-03-23 11:32, Dave Pawson pisze:
> How to add my own 'personal' dictionary please?
> I can't find any documentation at
> http://wiki.languagetool.org/development-overview

Because there is no personal dictionary implemented.

>
> Longer term issue.
> I add to en-GB grammar.xml file.
> Install / update the code... and
> lose my additions?
> Should 'myGrammar.xml' be xincluded perhaps?

Yep, this is long on our TODO list. But if your rule is general enough 
to be interesting to others, you might also want to share it with us and 
then you won't lose anything with the update. ;)

regards,
marcin

>
> regards
>


--
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: New grammar tool.

2014-03-23 Thread Dave Pawson
On 23 March 2014 10:48, Marcin Miłkowski  wrote:
> W dniu 2014-03-23 10:42, Dave Pawson pisze:

 1. Who is interested in the analysis? A user? A developer only?
>>>
>>> Anybody using the tool to develop non-trivial rules, i.e. rules that
>>> don't just refer to plain words but to part-of-speech tags.
>>
>> OK...  How about adding a 'reason' for the page?
>> "Please examine the analysis and check that you agree with it.
>> Then correct the example (or report a bug?"
>> Something like that?
>
> Not really. The analysis cannot be corrected by hand, as we need a rule
> that would actually generate the analysis we want for any similar
> sentence. For this, we use the disambiguator (disambiguation rules look
> almost exactly like grammar rules but their point is to rewrite the POS
> tags, usually by removing the ones that make no sense in the context).
>
> The analysis shows how one particular sentence was analyzed but if your
> rule uses part of speech tags, then it will match an infinite number of
> similarly structured sentences (or sentence parts). By showing the
> analysis, we make it easier to see the grammatical structure of the
> example sentences and to write up a rule describing the structure. In
> particular, one can see how the correct and incorrect examples differ to
> use the proper tags in the rule so that there are no false positives.
>
> Best,
> Marcin

So is it ... almost debug output for the devs?
But you do say 'to write up a rule describing the structure'.
Which I interpret as 'helping' to write better rules?

For me (as I said, non-grammarian) I find it very hard
to use due to the complexity of the analysis - but I can't think
of a better way, so I'll shut up!

regards




-- 
Dave Pawson
XSLT XSL-FO FAQ.
Docbook FAQ.
http://www.dpawson.co.uk

--
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: New grammar tool.

2014-03-23 Thread Marcin Miłkowski
W dniu 2014-03-23 10:42, Dave Pawson pisze:
> On 23 March 2014 09:20, Daniel Naber  wrote:
>> On 2014-03-23 08:32, Dave Pawson wrote:
>
>>
>>> Specifically:
>>> Entering a 'wrong' sentence. Hit a key and it skips to end of line?
>>> Odd, not used to it.
>>
>> I either don't understand what you mean or I cannot reproduce it. Which
>> browser are you using?
>
> Linux, Chrome browser.
>
> Enter text, back arrow 10 characters, press any character and the cursor
> skips to end of line? As in C-e in emacs.
> I.e. it seems not to be a 'usual' editable field?
>
>>
>>> 'show analysis' page.
>>
>> Having easier to understand tag names is on our TODO list but we're not
>> there yet.
>
> Understood.
>
>>
>>> 1. Who is interested in the analysis? A user? A developer only?
>>
>> Anybody using the tool to develop non-trivial rules, i.e. rules that
>> don't just refer to plain words but to part-of-speech tags.
>
> OK...  How about adding a 'reason' for the page?
> "Please examine the analysis and check that you agree with it.
> Then correct the example (or report a bug?"
> Something like that?

Not really. The analysis cannot be corrected by hand, as we need a rule 
that would actually generate the analysis we want for any similar 
sentence. For this, we use the disambiguator (disambiguation rules look 
almost exactly like grammar rules but their point is to rewrite the POS 
tags, usually by removing the ones that make no sense in the context).

The analysis shows how one particular sentence was analyzed but if your 
rule uses part of speech tags, then it will match an infinite number of 
similarly structured sentences (or sentence parts). By showing the 
analysis, we make it easier to see the grammatical structure of the 
example sentences and to write up a rule describing the structure. In 
particular, one can see how the correct and incorrect examples differ to 
use the proper tags in the rule so that there are no false positives.

Best,
Marcin


--
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Dictionary?

2014-03-23 Thread Dave Pawson
How to add my own 'personal' dictionary please?
I can't find any documentation at
http://wiki.languagetool.org/development-overview

Longer term issue.
I add to en-GB grammar.xml file.
Install / update the code... and
lose my additions?
   Should 'myGrammar.xml' be xincluded perhaps?

regards

-- 
Dave Pawson
XSLT XSL-FO FAQ.
Docbook FAQ.
http://www.dpawson.co.uk

--
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Finding rules

2014-03-23 Thread Dave Pawson
I'm being shown an 'error'
536.) Line 565, column 1, Rule ID: WHITESPACE_RULE

I'm using English.
How to find the grammar.xml file in use?
It seems there could be a number?
.../rules/en
.../rules/en/en-GB

===

re XML spell checking?
the markup is fooling the parser?
olympics olympic
is being reported as spelling error?
And (guessing)...>olympics is being reported as an error?
How to strip markup prior to tokenise?
  XSLT makes that easy but!

Big Ben Big Ben

Here Big Ben is used twice. Once for the indexer, once for the primary
content of the text. I.e. text stripping needs to be
vocabulary aware.


Unpaired_brackets error

In my XML I'm using "'"  single quote as both apostrophe
and single quote (rightly or wrongly).
--disable EN_UNPAIRED_BRACKETS
as a command line option would (presumably) disable match
checking for a number of characters?
  Is it possible to be more selective?

regards





-- 
Dave Pawson
XSLT XSL-FO FAQ.
Docbook FAQ.
http://www.dpawson.co.uk

--
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


OpenThesaurus news

2014-03-23 Thread Daniel Naber
Hi,

this is slightly off-topic, but as it's language-related I'll post it 
anyway: a new thesaurus website has recently been set up for Catalan, 
based on the German openthesaurus.de project. If your language doesn't 
have a free thesaurus, you may set up such a thesaurus website, too. It 
does require some technical knowledge, but I can help with that.

Here's a list of thesaurus websites based on openthesaurus.de:

Catalan(*) - http://openthesaurus.softcatala.org
German(*) - http://www.openthesaurus.de
Greek(*) - http://www.openthesaurus.gr
Polish - http://synonimy.sourceforge.net
Portuguese - http://openthesaurus.caixamagica.pt
Slovenian - http://www.tezaver.si
Spanish - http://openthes-es.berlios.de

(*) = it runs on the current version, whereas the other ones work on the 
old PHP-based version that's not maintained anymore. Migration from the 
old to the new version is supported.

The code is available at https://github.com/danielnaber/openthesaurus, 
let me know if you want to set up your own thesaurus and need help.

Regards
  Daniel


--
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: New grammar tool.

2014-03-23 Thread Dave Pawson
On 23 March 2014 09:20, Daniel Naber  wrote:
> On 2014-03-23 08:32, Dave Pawson wrote:

>
>> Specifically:
>> Entering a 'wrong' sentence. Hit a key and it skips to end of line?
>> Odd, not used to it.
>
> I either don't understand what you mean or I cannot reproduce it. Which
> browser are you using?

Linux, Chrome browser.

Enter text, back arrow 10 characters, press any character and the cursor
skips to end of line? As in C-e in emacs.
I.e. it seems not to be a 'usual' editable field?

>
>> 'show analysis' page.
>
> Having easier to understand tag names is on our TODO list but we're not
> there yet.

Understood.

>
>> 1. Who is interested in the analysis? A user? A developer only?
>
> Anybody using the tool to develop non-trivial rules, i.e. rules that
> don't just refer to plain words but to part-of-speech tags.

OK...  How about adding a 'reason' for the page?
"Please examine the analysis and check that you agree with it.
Then correct the example (or report a bug?"
Something like that?

>
>> What do you want the user to check? That the analysis has
>> been carried out correctly?
>
> I'll try to come up with a help text.

Thanks Daniel.



-- 
Dave Pawson
XSLT XSL-FO FAQ.
Docbook FAQ.
http://www.dpawson.co.uk

--
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: New grammar tool.

2014-03-23 Thread Daniel Naber
On 2014-03-23 08:32, Dave Pawson wrote:

> OK... I'm wondering which is more scary. Grammar terminology
> or XML markup.

The XML contains all that grammar terminology, too.

> Specifically:
> Entering a 'wrong' sentence. Hit a key and it skips to end of line?
> Odd, not used to it.

I either don't understand what you mean or I cannot reproduce it. Which 
browser are you using?

> 'show analysis' page.

Having easier to understand tag names is on our TODO list but we're not 
there yet.

> 1. Who is interested in the analysis? A user? A developer only?

Anybody using the tool to develop non-trivial rules, i.e. rules that 
don't just refer to plain words but to part-of-speech tags.

> What do you want the user to check? That the analysis has
> been carried out correctly?

I'll try to come up with a help text.

Regards
  Daniel


--
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: prototype of new rule editor

2014-03-23 Thread Daniel Naber
On 2014-03-23 01:19, Kumara Bhikkhu wrote:

> Minor issue: Is the "A example sentence" in the 2 text boxes
> deliberately wrong?

Yes, although I've now changed it to an example that's hopefully a bit 
clearer.

Regards
  Daniel


--
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: New grammar tool.

2014-03-23 Thread Dave Pawson
On 22 March 2014 20:30, Daniel Naber  wrote:
> On 2014-03-22 15:39, Dave Pawson wrote:
>
>> Initial reaction? Scary. I'm not a grammarian.
>> It is intimidating where the XML wasn't (for me).
>> Who is it for?
>
> It's for the 99% of people who have never edited an XML file.

OK... I'm wondering which is more scary. Grammar terminology
or XML markup.

>
>> Any help available? Any less scary
>> version available?
>
> A new version is online. It includes more help text, some usability
> fixes and a quick help for regular expressions. To further improve it, I
> need more detailed feedback.


Specifically:
Entering a 'wrong' sentence. Hit a key and it skips to end of line?
Odd, not used to it.

'show analysis' page.
SCARY!

I need some sort of explanation, in plain language, of these terms.
Otherwise I'll just walk away and laugh.

ChunkB-NP-singular
E-NP-singular
B-VP
B-PP
B-NP-singular
E-NP-singular
B-PP
B-NP-singular
I-NP-singular
E-NP-singular


E | B | I   ??
N = noun
V = verb
NP = proper noun
PP  = ??past participle (not heard since school days)

How to address this 'help' page?
1. Who is interested in the analysis? A user? A developer only?
What do you want the user to check? That the analysis has
been carried out correctly?  is a verb, analysis shows it as a noun?
that sort of thing? If so please guide the user to that purpose.


HTH


-- 
Dave Pawson
XSLT XSL-FO FAQ.
Docbook FAQ.
http://www.dpawson.co.uk

--
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel