jenkins-bot has submitted this change and it was merged.

Change subject: TextExtracts do not crop after initials
......................................................................


TextExtracts do not crop after initials

Disables sentence termination at a full stop preceeded by a capital
alphabet which is likely to be an initial.

Bug: T115795
Change-Id: Ibf38e87823155c704ffb106642944cbd05e3f632
---
M includes/ExtractFormatter.php
M tests/ExtractFormatterTest.php
2 files changed, 3 insertions(+), 3 deletions(-)

Approvals:
  MaxSem: Looks good to me, approved
  jenkins-bot: Verified



diff --git a/includes/ExtractFormatter.php b/includes/ExtractFormatter.php
index a6581f3..644dcaa 100644
--- a/includes/ExtractFormatter.php
+++ b/includes/ExtractFormatter.php
@@ -80,7 +80,7 @@
        public static function getFirstSentences( $text, 
$requestedSentenceCount ) {
                // Based on code from OpenSearchXml by Brion Vibber
                $endchars = array(
-                       '\.\s', '\!\s', '\?\s', // regular ASCII
+                       '[^\p{Lu}]\.\s', '\!\s', '\?\s', // regular ASCII
                        '。', // full-width ideographic full-stop
                        '.', '!', '?', // double-width roman forms
                        '。', // half-width ideographic full stop
diff --git a/tests/ExtractFormatterTest.php b/tests/ExtractFormatterTest.php
index 227f95c..de39909 100644
--- a/tests/ExtractFormatterTest.php
+++ b/tests/ExtractFormatterTest.php
@@ -109,12 +109,12 @@
                                1,
                                'Foo was born in 1977.',
                        ),
-                       /* @fixme
+                       // Bug T115795 - Test no cropping after initials
                        array(
                                'P.J. Harvey is a singer. She is awesome!',
                                1,
                                'P.J. Harvey is a singer.',
-                       ),*/
+                       ),
                );
        }
 

-- 
To view, visit https://gerrit.wikimedia.org/r/255959
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ibf38e87823155c704ffb106642944cbd05e3f632
Gerrit-PatchSet: 3
Gerrit-Project: mediawiki/extensions/TextExtracts
Gerrit-Branch: master
Gerrit-Owner: Sumit <[email protected]>
Gerrit-Reviewer: Jdlrobson <[email protected]>
Gerrit-Reviewer: MaxSem <[email protected]>
Gerrit-Reviewer: Sumit <[email protected]>
Gerrit-Reviewer: Waldir <[email protected]>
Gerrit-Reviewer: jenkins-bot <>

_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to