On 23 March 2014 11:23, Marcin Miłkowski <list-addr...@wp.pl> wrote:
> W dniu 2014-03-23 11:27, Dave Pawson pisze:
>> I'm being shown an 'error'
>> 536.) Line 565, column 1, Rule ID: WHITESPACE_RULE
>
> This is a Java rule, so it's not in XML:
>
> http://community.languagetool.org/rule/show/WHITESPACE_RULE?lang=en

Ah! Special  cases.

>
>> I'm using English.
>> How to find the grammar.xml file in use?
>> It seems there could be a number?
>> .../rules/en
>> .../rules/en/en-GB
>
> en-GB rules are applied (in addition to other English rules) only if you
> select English-UK.

So if I specify
java -jar ${langtools}/languagetool-commandline.jar --language EN-GB
--disable $disRules $*
there are two grammar files in use?
 IMHO it would help the user (or at least annoy him/her less) if I was told
which file / rule is being used.


>
>>
>> =======================
>>
>> re XML spell checking?
>> the markup is fooling the parser?
>
> Heh, calling this a parser is a bit too much. It's a dirty regexp.
>
>> <indexterm><primary>olympics</primary> </indexterm>olympic
>> is being reported as spelling error?
>> And (guessing)...>olympics is being reported as an error?
>
> Nope. The word "olympics" is at the beginning of line so it's considered
> to be a spelling mistake, at least for me here:
>
> 1.) Line 1, column 1, Rule ID: UPPERCASE_SENTENCE_START
> Message: This sentence does not start with an uppercase letter
> Suggestion: Olympics
> olympics olympic
> ^^^^^^^^

OK. My fault. Thanks.



>
>> How to strip markup prior to tokenise?
>
> It *is* stripped. You can use -v to see the verbose mode.
>
>>    XSLT makes that easy.... but!
>>
>> <indexterm><primary>Big Ben</primary> </indexterm>Big Ben
>>
>> Here Big Ben is used twice. Once for the indexer, once for the primary
>> content of the text. I.e. text stripping needs to be
>> vocabulary aware.
>
> Well, maybe the future docbook parser should ignore index terms as these
> are not correct English words but something like keys?

Yes, they are used to generate primary, secondary and tertiary terms
in the index.

I have asked on the docbook list, I'll provide a stylesheet for docbook
expanding includes, removing 'extras' such as indexterms.


>
>>
>> ========================
>> Unpaired_brackets error
>>
>> In my XML I'm using "'"  single quote as both apostrophe
>> and single quote (rightly or wrongly).
>> --disable EN_UNPAIRED_BRACKETS
>> as a command line option would (presumably) disable match
>> checking for a number of characters?
>
> You could but LT should handle apostrophes and single quotes without any
> problems. If it doesn't, please file an issue on github for me:
>
> https://github.com/languagetool-org/languagetool/issues?state=open
>
> But you can paste the example here, if it's not anything confidential,
> of course.



185.) Line 489, column 15, Rule ID: EN_UNPAIRED_BRACKETS
Message: Unpaired bracket or similar symbol
... key for the front door. <link
xlink:href="http://www.randrsecurity.com/";>R and R securi...

Clearly there isn't an unpaired " character.  Not sure what else it
might be reporting?
Not very clear though.

>
>>    Is it possible to be more selective?
>
> No. We don't have that option.

In which case could a rule be repeated with less content in the set?

regards




-- 
Dave Pawson
XSLT XSL-FO FAQ.
Docbook FAQ.
http://www.dpawson.co.uk

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to