The math behind score-simple is fairly easy to understand, while the math behind score-tfidf is sometimes surprising. Testing against a trivially small corpus will add unexpectedness also, since in tfidf there's some quantizing to make things faster.
The word lengths shouldn't affect tfidf. You can't mix simple with tfidf. Both algorithms produce numbers for scores, but the numbers aren't related to each other in any way, so combining them would be non-sensical. It's like adding 2 feet to 1 meter and getting 3. -jh- On Jun 10, 2015, at 5:01 PM, Andreas Hubmer <[email protected]> wrote: > Hi, > > I have found out that when I use "score-simple" in the cts:search, the result > is as expected. It also seems that the word lengths of the publishercodes > influence the scoring when using "score-logtfidf". Is that true? > My demo documents are small and consist of just 4 words. > > In the end, I don't want to use "score-simple" for the whole search because > it will also contain word-queries. Ideally I would like to be able to us > logtfidf scoring for the word-queries and simple scoring for the > element-value-queries. Is that possible? > > Regards, > Andreas > > 2015-06-09 14:28 GMT+02:00 Andreas Hubmer <[email protected]>: > Hi, > > Here is the statement. > > cts:search(/document:document, > cts:boost-query( > cts:or-query( > cts:element-value-query(fn:QName("ns","collection"), > "publisher", ("exact","lang=en"), 1)), > cts:or-query(( > cts:element-value-query(fn:QName("ns","publishercode"), "ON", > ("exact","lang=en"), 16), > cts:element-value-query(fn:QName("ns","publishercode"), "ISO", > ("exact","lang=en"), 13), > cts:element-value-query(fn:QName("ns","publishercode"), > "BEUTH", ("exact","lang=en"), 10), > cts:element-value-query(fn:QName("ns","statuscode"), "N", > ("exact","lang=en"), 8), > cts:element-value-query(fn:QName("ns","statuscode"), "VN", > ("exact","lang=en"), 7), > cts:element-value-query(fn:QName("ns","statuscode"), "N-E", > ("exact","lang=en"), 6), > cts:element-value-query(fn:QName("ns","statuscode"), "TR", > ("exact","lang=en"), 4), > cts:element-value-query(fn:QName("ns","collection"), > "corporate", ("exact","lang=en"), -16), > cts:element-range-query(fn:QName("ns","docdatetime"), ">=", > xs:date("1980-01-01"), ("score-function=linear","slope-factor=0.0625"), > 0.25)), ()) > ), > (cts:score-order("descending"), "faceted"))[1 to 10] > > For my question the element-value-queries on the statuscode, the collection > and the docdatetime should not matter. > > In a unit test database I have three documents that are equivalent except for > a numeric id and the publishercode (ON, ISO and BEUTH). Due to the boosting I > would expect the document with the publishercode ON to be the first result, > the document with ISO the second result and the document with BEUTH the last > result. > But the order is ISO, ON, BEUTH. > When I rename ON to ONT in both the query and the test document then order of > the query result is as expected: ONT, ISO, BEUTH. > Thus it seems that the word ON is ranked somehow lower, overriding the > weights. > > Cheers, > Andreas > > > 2015-06-09 13:02 GMT+02:00 Geert Josten <[email protected]>: > Hi Andreas, > > Can you share the entire cts:search statement? > > Cheers, > Geert > > From: Andreas Hubmer <[email protected]> > Reply-To: MarkLogic Developer Discussion <[email protected]> > Date: Tuesday, June 9, 2015 at 12:58 PM > To: MarkLogic Developer Discussion <[email protected]> > Subject: [MarkLogic Dev General] Boosting with simple scoring > > Hi, > > I am using cts:search with the option "score-logtfidf" and a cts:boost-query > for my search. I am boosting on some meta-elements that are enumerations > where the meaning does not matter. But what one value is "on" and due to > logtfidf scoring (I guess) the boost on that value does not matter much. > > logtfidf is perfect for my word queries, but not for the > element-value-queries that I use in the boosting query. Is it possible to use > "score-simple" for the boosting query only and "score-logtfidf" for the > "real" query? > > Best regards, > Andreas > > > -- > Andreas Hubmer > IT Consultant > > EBCONT enterprise technologies GmbH > Millennium Tower > Handelskai 94-96 > A-1200 Vienna > > OUR TEAM IS YOUR SUCCESS > > UID-Nr. ATU68135644 > HG St.Pölten - FN 399978 d > > _______________________________________________ > General mailing list > [email protected] > Manage your subscription at: > http://developer.marklogic.com/mailman/listinfo/general > > > > > -- > Andreas Hubmer > IT Consultant > > EBCONT enterprise technologies GmbH > Millennium Tower > Handelskai 94-96 > A-1200 Vienna > > Mobile: +43 664 60651861 > Fax: +43 2772 512 69-9 > Email: [email protected] > Web: http://www.ebcont.com > > OUR TEAM IS YOUR SUCCESS > > UID-Nr. ATU68135644 > HG St.Pölten - FN 399978 d > > > > -- > Andreas Hubmer > IT Consultant > > EBCONT enterprise technologies GmbH > Millennium Tower > Handelskai 94-96 > A-1200 Vienna > > Mobile: +43 664 60651861 > Fax: +43 2772 512 69-9 > Email: [email protected] > Web: http://www.ebcont.com > > OUR TEAM IS YOUR SUCCESS > > UID-Nr. ATU68135644 > HG St.Pölten - FN 399978 d > _______________________________________________ > General mailing list > [email protected] > Manage your subscription at: > http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________ General mailing list [email protected] Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
