W dniu 2014-04-11 22:16, Daniel Naber pisze:
> Hi,
>
> the following languages have been switched to use an SRX-based sentence
> tokenizer so we use the same approach for all languages and not a
> mixture of different methods:
>
> Asturian, Italian, Lithuanian, Malayalam, Swedish, Tagalog
>
> I don't speak these languages so I cannot properly test the change. If
> you speak one of the languages and find problems, speak up or try to fix
> them yourself (see
> http://wiki.languagetool.org/customizing-sentence-segmentation-in-srx-rules). 
> You can see how text gets split into sentence by using the -v option on the 
> command-line.
>
> Finally, when I wanted to remove RegexSentenceTokenizer I noticed that
> the SRXSentenceTokenizer uses segment.srx as a hard-coded path to its
> rules. That means nobody will be able to implement their own language
> without touching that file. Was there a reason for not moving the SRX
> rules to each language module? That should be doable when we make the
> SrxDocument in SRXSentenceTokenizer non-static?

I already explained this in the past because you asked the same 
question. This is how the SRX standard is designed. SRX files are 
supposed to host multiple languages, not just a single one. Usually, 
segmentation rule files contain multiple rule sets, and the SRX 
tokenizer supports inheritance. Splitting rule sets is against the 
design principles of the standard.

By splitting the file, we will break *all* languages, as I made them 
inherit several sets of common rules. Also, splitting will make 
maintainance a nightmare.

SRX file can be easily edited and we will happily accept all patches, 
also for languages without complete support in LT. Where's the problem?

Regards,
Marcin

------------------------------------------------------------------------------
Put Bad Developers to Shame
Dominate Development with Jenkins Continuous Integration
Continuously Automate Build, Test & Deployment 
Start a new project now. Try Jenkins in the cloud.
http://p.sf.net/sfu/13600_Cloudbees
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to