W dniu 2014-03-22 13:31, Dave Pawson pisze: > On 22 March 2014 11:56, Marcin Miłkowski <list-addr...@wp.pl> wrote: > >> And just to come back to your docbook question: I think it should be >> fairly easy to create a simple parser that would use AnnotatedText to >> check docbook format. I don't know whether there are any attributes that >> contain text content in docbook; if not, then writing a parser should be >> really easy. We could then include it in the next release of LT. >> >> Regards, >> Marcin > > > Thanks Marcin. > fyi, there seems to be no means to grammar check docbook xml > and I know many 'book length' texts are written in Docbook. > > There is no 'content' information in attributes -and anyway > Relax NG validation can check that. It is just the XML content > that needs checking.
Than we could simply create a very simplistic parser that forwards all textual content of all elements to LT and annotates everything else as non-text. The only trick is that Java XML parsers wouldn't allow us to see entities, raw encoding etc., so we might get mismatch for character positions in that cases. I'd need to see how this is solved in Okapi toolkit where raw XML is prepared for translation in XLIFF. Are there xml:lang attributes on docbook elements? We could use them to set LT to use proper language. This is a bit more complex but could work. Regards, Marcin ------------------------------------------------------------------------------ Learn Graph Databases - Download FREE O'Reilly Book "Graph Databases" is the definitive new guide to graph databases and their applications. Written by three acclaimed leaders in the field, this first edition is now available. Download your free book today! http://p.sf.net/sfu/13534_NeoTech _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel