jenkins-bot has submitted this change and it was merged.
Change subject: Trim text on the way into elasticsearch
......................................................................
Trim text on the way into elasticsearch
Most article text seems to come up with a hand full of trailing spaces.
Trim it to save a tiny bit of space and time.
Change-Id: If42b751257b9727869f5b9d7b18a5608e3ca421a
---
M includes/CirrusSearchUpdater.php
1 file changed, 1 insertion(+), 0 deletions(-)
Approvals:
Chad: Looks good to me, approved
jenkins-bot: Verified
diff --git a/includes/CirrusSearchUpdater.php b/includes/CirrusSearchUpdater.php
index ef4d18d..3737a12 100644
--- a/includes/CirrusSearchUpdater.php
+++ b/includes/CirrusSearchUpdater.php
@@ -253,6 +253,7 @@
$parserOutput = $page->getParserOutput( new
ParserOptions(), $page->getRevision()->getId() );
$text = Sanitizer::stripAllTags( SearchEngine::create(
'CirrusSearch' )
->getTextFromContent( $title,
$page->getContent(), $parserOutput ) );
+ $text = trim( $text ); // No need to store the trailing
spaces in Elasticsearch....
$doc->add( 'text', $text );
$doc->add( 'text_bytes', strlen( $text ) );
$doc->add( 'text_words', str_word_count( $text ) ); //
It would be better if we could let ES calculate it
--
To view, visit https://gerrit.wikimedia.org/r/95071
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: merged
Gerrit-Change-Id: If42b751257b9727869f5b9d7b18a5608e3ca421a
Gerrit-PatchSet: 2
Gerrit-Project: mediawiki/extensions/CirrusSearch
Gerrit-Branch: master
Gerrit-Owner: Manybubbles <[email protected]>
Gerrit-Reviewer: Chad <[email protected]>
Gerrit-Reviewer: jenkins-bot
_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits