-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Hi Dave,

On 23/03/14 12:32, davep wrote:
> I'm playing with a grammar checker that isn't as yet XML friendly. 
> One option is to strip all markup and pass through to the grammar 
> checker having expanded any xincludes.

Interesting -- what checker do you use, if I may ask?


> Issues: 1. Plain text output, Ideally block -> newline, inlines
> ->whitespace separation. 2. Indexing is a special. Null template
> for <db:indexterm/> 3. Ditto (remove markup) for toc
> 
> Can anyone think of any other 'specials' that might need
> processing to obtain a simple text file ready for a spell checker?

Since I am trying to implement some sort of style/terminology checker
here, here are the rules I use to prepare the text before the
terminology check:

https://www.gitorious.org/style-checker/style-checker/source/999eb9696fed15e75b01eee2febbb28562fc3144:source/xsl-checks/terminology.xslc

You can see that I try to hide things like literals and keys from the
style checker. The ##@sth## format is because I am using regular
expressions and wanted a format that is distinctive but does not
contain any regular expression characters.

Hth,
Stefan.


- --
SUSE LINUX Products GmbH, Maxfeldstraße 5, D-90409 Nürnberg
Geschäftsführer: Jeff Hawn, Jennifer Guild, Felix Imendörffer
HRB 16746 (Amtsgericht Nürnberg)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iF4EAREIAAYFAlMwADsACgkQ5AP3bIqhlM1h0gD/YZsuB/RNWJEyPYBhkYoBRoN6
q7EnNviWub9HPF1JmLMA/Ao0nDvCror2CfS/GauSA7LCaISXvkGQFVztP4OQ6c6v
=brM5
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to