Good catch. Case appears to also play a role. The following does not match "samsung" contains text "samsung bioepis co., ltd." using fuzzy using stop words ( "co", "ltd") using thesaurus at "thesaurus.xml"
even when the thesaurus contains the synonym "Samsung Bioepis Co., Ltd.” I tried the other way around (thesaurus in lower case, query in mixed case) and it also fails to match. Ron On January 5, 2016 at 10:29:35 AM, Christian Grün ([email protected]) wrote: Phew… My guess is that no one has seriously looked at the interplay between stop words and the thesaurus so far ;) Maybe (lower/upper) case plays a role, too? On Tue, Jan 5, 2016 at 4:26 PM, Ron Katriel <[email protected]> wrote: > Hi Christian, > > One follow up question. I thought stop words work in concert with the > thesaurus but I came across a case where they do not seem to. The following > query returns false > > "Samsung" contains text "Samsung Bioepis Co., Ltd." using fuzzy using > stop words ( "co", "ltd") using thesaurus at "thesaurus.xml" > > even though the thesaurus contains the following > > <entry> > <term>Samsung Bioepis</term> > <synonym> > <term>Samsung</term> > <relationship>BT</relationship> > </synonym> > </entry> > > When I add the following synonym to the entry > > <synonym> > <term>Samsung Bioepis Co., Ltd.</term> > <relationship>USE</relationship> > </synonym> > > the query matches. Am I missing something? > > Thanks, > Ron > > On January 3, 2016 at 8:33:14 PM, Ron Katriel ([email protected]) wrote: > > Thanks, Christian. I will look into the solution you suggested. Will need to > cache the stop words to avoid repeatedly opening the file for reading. > > Ron > > On January 3, 2016 at 8:14:51 PM, Christian Grün ([email protected]) > wrote: > >> The behavior I am looking for is getting back false whenever the text >> following ‘contains text' is reduced to an empty string. Is there a simple >> what of checking that? > > Hm, sounds easy, but I don’t have an easy answer to that. We should > probably extend our ft:tokenize function to also take a stopword > option. > > What you can always do is write some additional code: > > declare function local:sw($terms, $sw) { > let $sw := file:read-text-lines($sw) > return $terms contains text { $sw } all words > }; > if(local:sw('query terms', 'sw.txt')) then > ... > > > >> On January 3, 2016 at 7:41:47 PM, Christian Grün >> ([email protected]) >> wrote: >> >> Hi Ron, >> >>> "Superior Laboratories" contains text { "Medical Affairs" } using stop >>> words ( "medical", "affairs” ) >> >> I’m pretty sure that "true" is the right answer here. I must admit >> that, due to the variety of options provided by the XQFT spec, it’s >> often not too obvious what’s going on. >> >>> is there a way - without removing the stop words >>> from the file - to override this behavior in XQuery so the above match >>> will >>> fail? >> >> Maybe an additional check could be used after the first 'contains >> text' expression. In what particular cases would you like to get >> 'false' as result? >> >> Christian

