W dniu 2014-03-22 13:31, Dave Pawson pisze:
> On 22 March 2014 11:56, Marcin Miłkowski <list-addr...@wp.pl> wrote:
>
>> And just to come back to your docbook question: I think it should be
>> fairly easy to create a simple parser that would use AnnotatedText to
>> check docbook format. I don't know whether there are any attributes that
>> contain text content in docbook; if not, then writing a parser should be
>> really easy. We could then include it in the next release of LT.
>>
>> Regards,
>>    Marcin
>
>
> Thanks Marcin.
> fyi, there seems to be no means to grammar check docbook xml
> and I know many 'book length' texts are written in Docbook.
>
> There is no 'content' information in attributes -and anyway
> Relax NG validation can check that. It is just the XML content
> that needs checking.

Than we could simply create a very simplistic parser that forwards all 
textual content of all elements to LT and annotates everything else as 
non-text. The only trick is that Java XML parsers wouldn't allow us to 
see entities, raw encoding etc., so we might get mismatch for character 
positions in that cases. I'd need to see how this is solved in Okapi 
toolkit where raw XML is prepared for translation in XLIFF.

Are there xml:lang attributes on docbook elements? We could use them to 
set LT to use proper language. This is a bit more complex but could work.

Regards,
Marcin



------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to