On 22 March 2014 13:11, Marcin Miłkowski <list-addr...@wp.pl> wrote:

>> There is no 'content' information in attributes -and anyway
>> Relax NG validation can check that. It is just the XML content
>> that needs checking.
>
> Than we could simply create a very simplistic parser that forwards all
> textual content of all elements to LT and annotates everything else as
> non-text. The only trick is that Java XML parsers wouldn't allow us to
> see entities, raw encoding etc., so we might get mismatch for character
> positions in that cases. I'd need to see how this is solved in Okapi
> toolkit where raw XML is prepared for translation in XLIFF.

I'd suggest parsing first to expand entities to obtain 'full' content.

>
> Are there xml:lang attributes on docbook elements? We could use them to
> set LT to use proper language. This is a bit more complex but could work.

Yes. 'Most' block and inline elements can take xml:lang attribute... I nearly
suggested this wrt the post on mixed languages earlier today.
Issues: xml:lang values would need to match ISO ...639(tbc).
 What of languages which don't? simply discard the value?

How might a language change be marked up (metadata) on the plain text?
A solution to this might meet the needs of plain text language switch?
  Perhaps http://www.w3.org/International/articles/ruby/

HTH




-- 
Dave Pawson
XSLT XSL-FO FAQ.
Docbook FAQ.
http://www.dpawson.co.uk

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to