Good catch. Case appears to also play a role. The following does not match

    "samsung" contains text "samsung bioepis co., ltd." using fuzzy using stop 
words ( "co", "ltd") using thesaurus at "thesaurus.xml"

even when the thesaurus contains the synonym "Samsung Bioepis Co., Ltd.”

I tried the other way around (thesaurus in lower case, query in mixed case) and 
it also fails to match.

Ron

On January 5, 2016 at 10:29:35 AM, Christian Grün ([email protected]) 
wrote:

Phew… My guess is that no one has seriously looked at the interplay  
between stop words and the thesaurus so far ;) Maybe (lower/upper)  
case plays a role, too?  


On Tue, Jan 5, 2016 at 4:26 PM, Ron Katriel <[email protected]> wrote:  
> Hi Christian,  
>  
> One follow up question. I thought stop words work in concert with the  
> thesaurus but I came across a case where they do not seem to. The following  
> query returns false  
>  
> "Samsung" contains text "Samsung Bioepis Co., Ltd." using fuzzy using  
> stop words ( "co", "ltd") using thesaurus at "thesaurus.xml"  
>  
> even though the thesaurus contains the following  
>  
> <entry>  
> <term>Samsung Bioepis</term>  
> <synonym>  
> <term>Samsung</term>  
> <relationship>BT</relationship>  
> </synonym>  
> </entry>  
>  
> When I add the following synonym to the entry  
>  
> <synonym>  
> <term>Samsung Bioepis Co., Ltd.</term>  
> <relationship>USE</relationship>  
> </synonym>  
>  
> the query matches. Am I missing something?  
>  
> Thanks,  
> Ron  
>  
> On January 3, 2016 at 8:33:14 PM, Ron Katriel ([email protected]) wrote:  
>  
> Thanks, Christian. I will look into the solution you suggested. Will need to  
> cache the stop words to avoid repeatedly opening the file for reading.  
>  
> Ron  
>  
> On January 3, 2016 at 8:14:51 PM, Christian Grün ([email protected])  
> wrote:  
>  
>> The behavior I am looking for is getting back false whenever the text  
>> following ‘contains text' is reduced to an empty string. Is there a simple  
>> what of checking that?  
>  
> Hm, sounds easy, but I don’t have an easy answer to that. We should  
> probably extend our ft:tokenize function to also take a stopword  
> option.  
>  
> What you can always do is write some additional code:  
>  
> declare function local:sw($terms, $sw) {  
> let $sw := file:read-text-lines($sw)  
> return $terms contains text { $sw } all words  
> };  
> if(local:sw('query terms', 'sw.txt')) then  
> ...  
>  
>  
>  
>> On January 3, 2016 at 7:41:47 PM, Christian Grün  
>> ([email protected])  
>> wrote:  
>>  
>> Hi Ron,  
>>  
>>> "Superior Laboratories" contains text { "Medical Affairs" } using stop  
>>> words ( "medical", "affairs” )  
>>  
>> I’m pretty sure that "true" is the right answer here. I must admit  
>> that, due to the variety of options provided by the XQFT spec, it’s  
>> often not too obvious what’s going on.  
>>  
>>> is there a way - without removing the stop words  
>>> from the file - to override this behavior in XQuery so the above match  
>>> will  
>>> fail?  
>>  
>> Maybe an additional check could be used after the first 'contains  
>> text' expression. In what particular cases would you like to get  
>> 'false' as result?  
>>  
>> Christian  

Reply via email to