Re: Finding rules

Dave Pawson Mon, 24 Mar 2014 10:15:13 -0700

On 24 March 2014 16:29, Marcin Miłkowski <list-addr...@wp.pl> wrote:


> Well, there is a (partially broken) emacs plugin:
>
> http://www.emacswiki.org/emacs/langtool.el
>
> I'm not really into emacs lisp, so I wasn't able to make it run
> flawlessly but you might want to use it. Should be easier than grep, as
> this parses LT output directly.

No thanks, I author XML in emacs, never process it.



>>> For both use scenarios, we need to retain proper error locations, so we
>>> should not use XSLT for conversion unless XSLT creates an intermediary
>>> format with positions hard-coded as attributes, and then we would have a
>>> Java parser for the intermediary format. This might have an advantage of
>>> being able to write up an XSLT for just any XML format easily instead of
>>> creating separate Java parsers.
>>
>> I'm -1 on that. XML is white space agnostic (one of its benefits for me)
>> so line numbers have less meaning?
>
> Error locations are not only line numbers but also column numbers. This
> really helps software to underline errors as you type.

I want to see it, not intermediate software. XML is line / ws agnostic
so it is of little help really?


>
>>
>> Processing 'any' XML (to me) would be advantageous. Here the
>> requirement would be simply to skip over elements/attributes (and
>> comments, PI's perhaps?).
>> then simply switch off the white space rule, since it is not
>> applicable? Ditto the smart quote
>> rule?
>
> Smart quote rule is fine if your output is for printing purposes. It
> just depends on the language.

XML is v.rarely used for presentation, without transformation first,
so smart quotes are of little / no use.


>>>> ?? I don't think I am using -b (I am not on my main machine, I will check).
>>>> Does the rule 'reset' at end of line? That sounds wrong for plain text?
>>>
>>> It depends on how your plain text file looks. Some use two end of line
>>> markers for the end of paragraph, some only one. We have these two settings.
>>
>> I  think we are out of sync here? I am currently processing the XML file
>> without stripping markup.
>
> You talk about plain text, I reply about plain text. Not about XML. For
> plain text, there are reasons to look at end of line markers.

Agreed. I have not, as yet, produced plain text from docbook XML,
hence all my comments refer to processing XML.

>
>
>>    Checking, I am not using the -b parameter.
>> by shell script is
>>
>> #!/bin/bash
>> langtools=/apps/langtools
>> disRules="WHITESPACE_RULE"
>> java -jar ${langtools}/languagetool-commandline.jar --language EN-GB
>> -c utf-8  --disable $disRules $*
>
> I could not reproduce the error you mention without -b, but again, maybe
> you have two EOLs in your file.

I have lots of \n in the file, none of which are relevant?


>
>>
>>
>>>
>>> However, for XML input it may be the case that end of line markers
>>> should be completely ignored during text segmentation. Actually, we
>>> almost could ignore these as the text is segmented independently from
>>> the rules. But I frankly don't know whether EOLs have any use in docbook
>>> or not. They don't have any in xhtml...
>>
>> No, whitespace is (mainly) ignored in XML, nl,TAB, sp etc.
>
> Unless of course we have xml:space="preserve".


That's the 'mainly' caveat <grin/>

>
>>
>>>
>>> I say we "almost could" because there's code that we additionally run
>>> for end of lines, and we could simply skip it, but only in the next
>>> release it's possible to add the option to the command-line (and other
>>> places) because we're in the feature freeze period now.
>>
>> Understood. If I can help please shout.
>
> After the release, I'll add the option to suppress EOL segmentation
> altogether.


Thanks, that would be a help.

regards



-- 
Dave Pawson
XSLT XSL-FO FAQ.
Docbook FAQ.
http://www.dpawson.co.uk

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Re: Finding rules

Reply via email to