jenkins-bot has submitted this change and it was merged.
Change subject: Fix relevancy_api test with/without boostlinks
......................................................................
Fix relevancy_api test with/without boostlinks
Computed norms were the same for both articles. By disabling boostLinks scores
for these docs are exactly the same. By adding 5 words to Relevancylinktest
Larger Extraword we decrease the all.plain norm to 0.109375 (instead of 0.125).
Bug: T133756
Change-Id: If340a097da1c6ed3bef6e024e8b3d147b56c8a7b
---
M tests/browser/features/relevancy_api.feature
1 file changed, 18 insertions(+), 1 deletion(-)
Approvals:
EBernhardson: Looks good to me, approved
jenkins-bot: Verified
diff --git a/tests/browser/features/relevancy_api.feature
b/tests/browser/features/relevancy_api.feature
index 4830e13..69dc2bc 100644
--- a/tests/browser/features/relevancy_api.feature
+++ b/tests/browser/features/relevancy_api.feature
@@ -64,7 +64,7 @@
Scenario: Incoming links count in page weight
Given a page named Relevancylinktest Smaller exists
- And a page named Relevancylinktest Larger Extraword exists
+ And a page named Relevancylinktest Larger Extraword exists with contents
Relevancylinktest needs 5 extra words
And a page named Relevancylinktest Larger/Link A exists with contents
[[Relevancylinktest Larger Extraword]]
And a page named Relevancylinktest Larger/Link B exists with contents
[[Relevancylinktest Larger Extraword]]
And a page named Relevancylinktest Larger/Link C exists with contents
[[Relevancylinktest Larger Extraword]]
@@ -74,6 +74,23 @@
Then Relevancylinktest Smaller is the first api search result
And Relevancylinktest Larger Extraword is the second api search result
# This test can fail spuriously for the same reasons that "Redirects count
as incoming links" can fail
+ # With the allfield Relevancylinktest Smaller will get 21 freq for the
term Relevancylinktest and a
+ # length norm of 0.125 for the all.plain (title is copied to the text
field if no text is set)
+ # Relevancylinktest Larger Extraword will get 21 freq for the same term
(content being set we re-add
+ # "Relevancylinktest" in the content to match the 21 freq of
Relevancylinktest Smaller)
+ # We add extra words to decrease the length norm to 0.109375.
+ # freq 21 is explained by the copy_to features which will copy title words
20 times to the all.plain
+ # add one occurrence for the term in the text field and you'll get 21.
+ # for norms: Relevancylinktest Smaller will have a term length of 40 + 2
-> 42 which will be computed as
+ # 1/sqrt(42) => 0.154 and then encoded as 0.125 (precision reduction)
+ # Relevancylinktest Larger Extraword will be 60 + 5 => 65 computed as
0.124 but encoded as 0.109
+ # Small java test case to understand:
+ # int termCount = 65;
+ # TFIDFSimilarity sim = new ClassicSimilarity();
+ # FieldInvertState fiv = new FieldInvertState("test", 0, termCount, 0, 0,
1f);
+ # System.out.println("computed: " + sim.lengthNorm(fiv));
+ # System.out.println("encoded: " +
sim.decodeNormValue(sim.computeNorm(fiv)));
+
Scenario: Results are sorted based on how close the match is
When I api search with disabled incoming link weighting for
Relevancyclosetest FoƓ
--
To view, visit https://gerrit.wikimedia.org/r/286426
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: merged
Gerrit-Change-Id: If340a097da1c6ed3bef6e024e8b3d147b56c8a7b
Gerrit-PatchSet: 3
Gerrit-Project: mediawiki/extensions/CirrusSearch
Gerrit-Branch: es2.x
Gerrit-Owner: DCausse <[email protected]>
Gerrit-Reviewer: DCausse <[email protected]>
Gerrit-Reviewer: EBernhardson <[email protected]>
Gerrit-Reviewer: Gehel <[email protected]>
Gerrit-Reviewer: Manybubbles <[email protected]>
Gerrit-Reviewer: Smalyshev <[email protected]>
Gerrit-Reviewer: jenkins-bot <>
_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits