Re: Finding rules

Dave Pawson Tue, 25 Mar 2014 01:49:39 -0700

On 24 March 2014 18:26, Marcin Miłkowski <list-addr...@wp.pl> wrote:
> W dniu 2014-03-24 18:14, Dave Pawson pisze:



>>>>> For both use scenarios, we need to retain proper error locations, so we
>>>>> should not use XSLT for conversion unless XSLT creates an intermediary
>>>>> format with positions hard-coded as attributes, and then we would have a
>>>>> Java parser for the intermediary format. This might have an advantage of
>>>>> being able to write up an XSLT for just any XML format easily instead of
>>>>> creating separate Java parsers.
>>>>
>>>> I'm -1 on that. XML is white space agnostic (one of its benefits for me)
>>>> so line numbers have less meaning?
>>>
>>> Error locations are not only line numbers but also column numbers. This
>>> really helps software to underline errors as you type.
>>
>> I want to see it, not intermediate software. XML is line / ws agnostic
>> so it is of little help really?
>
> We're out of sync. The location is used instead of grep to highlight
> error positions in the file. Note: you might have the same sequence of
> words in your file eight times but only one might be incorrect. LT would
> highlight only one, grep all eight, so this saves your time and removes
> confusions. LT cares about context-dependence of errors, so this is not
> a science fiction scenario. For example, "that that" may be correct in
> English, but it also might be completely fine. LT tries to suppress
> false positives for this.

Sorry, I agree with your logic. Just that I've given up on error location
in original XML. I am prepared to work a little (I do now with validation
against the schema) to get to the source file / line.


>
> So whatever your format, location *is* important to avoid confusion and
> save the user's time.

Understood. My position is that for XML it isn't worth the effort?



>>> Smart quote rule is fine if your output is for printing purposes. It
>>> just depends on the language.
>>
>> XML is v.rarely used for presentation, without transformation first,
>> so smart quotes are of little / no use.
>
> I wrote some of my logic slides in XHTML, so...

Which is often the presentational form of an XML source.
If I ever used smart quotes, it would be within the element content,
So pre-processing to strip markup would resolve that issue for me.


>>> After the release, I'll add the option to suppress EOL segmentation
>>> altogether.
>>
>>
>> Thanks, that would be a help.
>
> I think we're getting some clear specs of what should be done for a sane
> generic XML filter:
>
> - discard \n in segmentation, unless in xml:space="preserve";
> - take care of standard entities;
> - maybe disable whitespace rule by default.

expand xIncludes
Ignore standard character entities
 amp, lt, gt, apos, quot

Cater for elements to be removed in the xml parse?
Simple element list, possibly namespaced.
Perhaps in an xml document using xpath?



HTH




-- 
Dave Pawson
XSLT XSL-FO FAQ.
Docbook FAQ.
http://www.dpawson.co.uk

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Re: Finding rules

Reply via email to