Revision: 8572 http://languagetool.svn.sourceforge.net/languagetool/?rev=8572&view=rev Author: milek_pl Date: 2012-12-17 17:28:00 +0000 (Mon, 17 Dec 2012) Log Message: ----------- [pl] fix errors in sentence segmentation to make two rules active again
Modified Paths: -------------- trunk/JLanguageTool/src/main/resources/org/languagetool/resource/segment.srx trunk/JLanguageTool/src/main/resources/org/languagetool/rules/pl/grammar.xml Modified: trunk/JLanguageTool/src/main/resources/org/languagetool/resource/segment.srx =================================================================== --- trunk/JLanguageTool/src/main/resources/org/languagetool/resource/segment.srx 2012-12-17 17:25:24 UTC (rev 8571) +++ trunk/JLanguageTool/src/main/resources/org/languagetool/resource/segment.srx 2012-12-17 17:28:00 UTC (rev 8572) @@ -4,10 +4,11 @@ <formathandle type="start" include="no"></formathandle> <formathandle type="end" include="yes"></formathandle> <formathandle type="isolated" include="no"></formathandle> -<okpsrx:options oneSegmentIncludesAll="no" trimLeadingWhitespaces="no" trimTrailingWhitespaces="no" useJavaRegex="no"></okpsrx:options> +<okpsrx:options oneSegmentIncludesAll="no" trimLeadingWhitespaces="no" trimTrailingWhitespaces="no" useJavaRegex="yes"></okpsrx:options> <okpsrx:sample language="pl_two" useMappedRules="no">Ingevolge paragraaf 4.7.4. van het SGR is ... Jestem zły, bo... (wpisz własną odpowiedź) drażni mnie krowa. Kierownictwo toleruje pracę z konieczności ("Skąd wezmę innego pracownika?") i jest zadowolone. Józek D. (45 l.) T.Love +Ad. 1, 5 ha. ziemi na Mazurach Planowany wzrost przychodów najludniejszych gmin w br. będzie o 2 pkt. proc. wyższy od prognozowanej na br. stopy średniorocznej inflacji (15 proc.) natomiast wzrost wydatków przewyższa o 12 pkt. proc. stopę inflacji (co za dziwota! - przyp. red.) i tak dalej. Nie jesteś Rosjaninem? - spytał przedstawiciel okręgu. 11.X.2001 (b. dobra gra). Temperatura wody w systemie wynosi 30°C.W skład obiegu.. @@ -25,7 +26,6 @@ </header> <body> <languagerules> - <languagerule languagerulename="Greek"> <!--κ.λπ. - και λοιπά--> <rule break="no"> @@ -55,10 +55,10 @@ <afterbreak>\p{Lu}\p{Ll}</afterbreak> </rule> </languagerule> - <languagerule languagerulename="Polish"> +<!--includes v"ad." misused as abbreviation--> <rule break="no"> -<beforebreak>\badw\.\s</beforebreak> +<beforebreak>\b[Aa]dw?\.\s</beforebreak> <afterbreak></afterbreak> </rule> <rule break="no"> @@ -925,6 +925,11 @@ <beforebreak>\b[Nn]a\sos\.\s</beforebreak> <afterbreak></afterbreak> </rule> +<!--20 ha. ziemi na Mazurach--> +<rule break="no"> +<beforebreak>\bha\.\s</beforebreak> +<afterbreak>[\p{Ll}]</afterbreak> +</rule> <!--min. 30 zł lub cena min. od 30 zł--> <rule break="no"> <beforebreak>\bmin\.\s</beforebreak> @@ -1041,7 +1046,6 @@ <afterbreak>\p{Lu}\p{Ll}</afterbreak> </rule> </languagerule> - <languagerule languagerulename="English"> <rule break="no"> <beforebreak>\b[nN]o\.\s</beforebreak> @@ -1217,7 +1221,6 @@ <afterbreak>\p{Lu}\p{Ll}</afterbreak> </rule> </languagerule> - <languagerule languagerulename="Romanian"> <rule break="no"> <beforebreak>\b\d+\.\s</beforebreak> @@ -1312,7 +1315,6 @@ <afterbreak>\p{Lu}\p{Ll}</afterbreak> </rule> </languagerule> - <languagerule languagerulename="Dutch"> <rule break="no"> <beforebreak>\b(Afr|Am|Ar|Br|Cie|Comp|Dhr|Dr|Em|Fa|Kon)\.\s</beforebreak> @@ -1464,7 +1466,6 @@ <afterbreak>\p{Lu}\p{Ll}</afterbreak> </rule> </languagerule> - <languagerule languagerulename="Slovak"> <rule break="no"> <beforebreak>\b(Bc|Mgr|RNDr|PharmDr|PhDr|JUDr|PaedDr|ThDr|Ing|MUDr|MDDr|MVDr|Dr|ThLic|PhD|ArtD|ThDr|Dr|DrSc|CSs|prof)\.\s</beforebreak> @@ -3139,7 +3140,6 @@ <afterbreak>\p{Lu}\p{Ll}</afterbreak> </rule> </languagerule> - <languagerule languagerulename="Icelandic"> <!-- Numbers --> <rule break="no"> @@ -3965,7 +3965,6 @@ <afterbreak>\p{Lu}\p{Ll}</afterbreak> </rule> </languagerule> - <languagerule languagerulename="Russian"> <rule break="no"> <beforebreak>\b\d+\.\s</beforebreak> @@ -4065,7 +4064,6 @@ <afterbreak>\p{Lu}\p{Ll}</afterbreak> </rule> </languagerule> - <languagerule languagerulename="Default"> <rule break="yes"> <beforebreak>\u2029</beforebreak> @@ -4082,21 +4080,18 @@ <afterbreak></afterbreak> </rule> </languagerule> - <languagerule languagerulename="ByLineBreak"> <rule break="yes"> <beforebreak>\r?\n</beforebreak> <afterbreak></afterbreak> </rule> </languagerule> - <languagerule languagerulename="ByTwoLineBreaks"> <rule break="yes"> <beforebreak>\r?\n\s*\r?\n[\t]*</beforebreak> <afterbreak></afterbreak> </rule> </languagerule> - <languagerule languagerulename="Slovenian"> <rule break="no"> <beforebreak>\b[dD]r\.\s</beforebreak> @@ -4251,7 +4246,6 @@ <afterbreak>\p{Lu}\p{Ll}</afterbreak> </rule> </languagerule> - <languagerule languagerulename="Catalan"> <!-- Abbreviations that cannot finish sentences--> <rule break="no"> @@ -4320,7 +4314,6 @@ <afterbreak>[¡¿«»"'\p{Ps}]*\p{Lu}\p{L}*</afterbreak> </rule> </languagerule> - <languagerule languagerulename="Spanish"> <!-- Abbreviations that cannot finish sentences--> <rule break="no"> @@ -4389,7 +4382,6 @@ <afterbreak>[¡¿«»"'\p{Ps}]*\p{Lu}\p{L}*</afterbreak> </rule> </languagerule> - <languagerule languagerulename="German"> <!-- Split e.g.: He won't. Really. --> <rule break="yes"> @@ -4538,7 +4530,6 @@ <afterbreak>\p{Lu}\p{Ll}</afterbreak> </rule> </languagerule> - <languagerule languagerulename="Danish"> <!-- Split e.g.: He won't. Really. --> <rule break="yes"> @@ -4692,7 +4683,6 @@ <afterbreak>\p{Lu}\p{Ll}</afterbreak> </rule> </languagerule> - <languagerule languagerulename="Esperanto"> <!-- Esperanto abbreviations (see http://eo.lernu.net/lernado/gramatiko/demandoj/mallongigoj.php) --> <rule break="no"> @@ -4721,7 +4711,6 @@ <afterbreak>\p{Lu}\p{Ll}</afterbreak> </rule> </languagerule> - <languagerule languagerulename="Ukrainian"> <rule break="no"> <beforebreak>\b\d+\.\s</beforebreak> @@ -4861,7 +4850,6 @@ <afterbreak>\p{Lu}\p{Ll}</afterbreak> </rule> </languagerule> - <languagerule languagerulename="Belarusian"> <rule break="no"> <beforebreak>\b\d+\.\s</beforebreak> @@ -4949,7 +4937,6 @@ <afterbreak>\p{Lu}\p{Ll}</afterbreak> </rule> </languagerule> - <languagerule languagerulename="Galician"> <!-- s. XIX; s.IX; sec. XX; séc. XX --> <rule break="no"> @@ -5065,7 +5052,6 @@ <afterbreak>['"«¡¿\p{Ps}\p{Pi}]?\p{Lu}\p{Ll}*</afterbreak> </rule> </languagerule> - <languagerule languagerulename="Japanese"> <rule break="no"> <beforebreak>[:]+[\p{Pe}\p{Pf}\p{Po}"-[\u002C\u003A\u003B\u055D\u060C\u061B\u0703\u0704\u0705\u0706\u0707\u0708\u0709\u07F8\u1363\u1364\u1365\u1366\u1802\u1804\u1808\u204F\u205D\u3001\uA60D\uFE10\uFE11\uFE13\uFE14\uFE50\uFE51\uFE54\uFE55\uFF0C\uFF1A\uFF1B\uFF64]]*</beforebreak> Modified: trunk/JLanguageTool/src/main/resources/org/languagetool/rules/pl/grammar.xml =================================================================== --- trunk/JLanguageTool/src/main/resources/org/languagetool/rules/pl/grammar.xml 2012-12-17 17:25:24 UTC (rev 8571) +++ trunk/JLanguageTool/src/main/resources/org/languagetool/rules/pl/grammar.xml 2012-12-17 17:28:00 UTC (rev 8572) @@ -8810,8 +8810,7 @@ <short>Błąd ortograficzny</short> <example type="correct">Ruda Śląska, Katow., Chorzów</example> <example correction="Katow." type="incorrect">Ruda Śląska, <marker>Ka-ce</marker>, Chorzów</example> - </rule> - <!-- commented out because of the warning in PatternRuleTest if CHECK_WITH_SENTENCE_SPLITTING = true + </rule> <rule id="AD_KROPKA" name="„ad.” (ad)"> <pattern> <marker> @@ -8825,7 +8824,7 @@ <example type="correct">Ad 1.</example> <example correction="Ad" type="incorrect"><marker>Ad.</marker> 1.</example> </rule> - --> + <rulegroup id="LICZBY_SLOWNIE_SKROT" name="Skrótowy zapis liczb: 10-tka (dziesiątka) itd."> <rule> <pattern> @@ -9808,8 +9807,7 @@ <example type="correct">On mieszka pod nr. 2.</example> <example type="correct">Oto samochód nr 2.</example> <example correction="nr 2" type="incorrect">Oto samochód <marker>nr. 2</marker>.</example> - </rule> - <!-- commented out because of the warning in PatternRuleTest if CHECK_WITH_SENTENCE_SPLITTING = true + </rule> <rule> <pattern> <token regexp="yes">\d+</token> @@ -9824,8 +9822,7 @@ <example type="correct">Ha, ha, ha!</example> <example correction="gr" type="incorrect">Ten palmtop waży 100 <marker>gr.</marker>, jest więc bardzo lekki.</example> <example type="correct">Apollo (gr. Apollon, zwany też Phoibos "Jaśniejący")</example> - </rule> - --> + </rule> <rule> <pattern> <token regexp="yes">dl|ml|dag|cm|dm|zł|kg|mln|mld|min|npl|pkt|pg|tg|cos|cosec|sec|sin|rkm|wg</token> This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. ------------------------------------------------------------------------------ LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d _______________________________________________ Languagetool-commits mailing list Languagetool-commits@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-commits