[ 
https://issues.apache.org/jira/browse/SOLR-12731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16604228#comment-16604228
 ] 

Danilo Tomasoni commented on SOLR-12731:
----------------------------------------

I'm sorry you are right.

I'll post my question in the user list. I'll also check on the admin/analysis 
tab.

> SynonimGraphFilter expands wrong synonims
> -----------------------------------------
>
>                 Key: SOLR-12731
>                 URL: https://issues.apache.org/jira/browse/SOLR-12731
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: search
>    Affects Versions: 7.3.1
>         Environment: Ubuntu 16.04.5 LTS, java version 1.8.0_181
>            Reporter: Danilo Tomasoni
>            Priority: Major
>              Labels: synonyms
>
> Hello to all I have an issue related to synonimgraphfilter expanding the 
> wrong synonims for a phrase-term at query time.
> I have a dictionary with the following lines
> {code:java}
> P49902,Cytosolic purine 5'-nucleotidase,EC 3.1.3.5,Cytosolic 5'-nucleotidase 
> II
> A8K9N1,Glucosidase\, beta\, acid 3,Cytosolic,Glucosidase\, beta\, acid 
> 3,Cytosolic\, isoform CRA_b,cDNA FLJ78196\, highly similar to Homo sapiens 
> glucosidase\, beta\, acid 3,cytosolic,GBA3\, mRNA,cDNA\, FLJ93688\, Homo 
> sapiens glucosidase\, beta\, acid 3,cytosolic,GBA3\, mRNA
> {code}
> and two documents
> {code:java}
> {"body": "8. The method of claim 6 wherein said method inhibits at least one 
> 5′-nucleotidase chosen from cytosolic 5′-nucleotidase II (cN-II), cytosolic 
> 5′-nucleotidase IA (cN-IA), cytosolic 5′-nucleotidase IB (cN-IB), cytosolic 
> 5′-nucleotidase IMA (cN-IIIA), cytosolic 5′-nucleotidase NIB (cN-IIIB), 
> ecto-5′-nucleotidase (eN, CD73), cytosolic 5′(3′)-deoxynucleotidase (cdN) and 
> mitochondrial 5′(3′)-deoxynucleotidase (mdN)."}
> {"body": "Trichomonosis caused by the flagellate protozoan Trichomonas 
> vaginalis represents the most prevalent nonviral sexually transmitted disease 
> worldwide (WHO-DRHR 2012). In women, the symptoms are cyclic and often worsen 
> around the menstruation period. In men, trichomonosis is largely asymptomatic 
> and these men are considered to be carriers of T. vaginalis (Petrin et al. 
> 1998). This infection has been associated with birth outcomes (Klebanoff et 
> al. 2001), infertility (Grodstein et al. 1993), cervical and prostate cancer 
> (Viikki et al. 2000, Sutcliffe et al. 2012) and pelvic inflammatory disease 
> (Cherpes et al. 2006). Importantly, T. vaginalis is a co-factor in human 
> immunodeficiency virus transmission and acquisition (Sorvillo et al. 2001, 
> Van Der Pol et al. 2008). Therefore, it is important to study the 
> host-parasite relationship to understand T. vaginalis infection and 
> pathogenesis. Colonisation of the mucosa by T. vaginalis is a complex 
> multi-step process that involves distinct mechanisms (Alderete et al. 2004). 
> The parasite interacts with mucin (Lehker & Sweeney 1999), adheres to vaginal 
> epithelial cells (VECs) in a process mediated by adhesion proteins (AP120, 
> AP65, AP51, AP33 and AP23) and undergoes dramatic morphological changes from 
> a pyriform to an amoeboid form (Engbring & Alderete 1998, Kucknoor et al. 
> 2005, Moreno-Brito et al. 2005). After adhesion to VECs, the synthesis and 
> gene expression of adhesins are increased (Kucknoor et al. 2005). These 
> mechanisms must be tightly regulated and iron plays a pivotal role in this 
> regulation. Iron is an essential element for all living organisms, from the 
> most primitive to the most complex, as a component of haeme, iron-sulphur 
> clusters and a variety of proteins. Iron is known to contribute to biological 
> functions such as DNA and RNA synthesis, oxygen transport and metabolic 
> reactions. T. vaginalis has developed multiple iron uptake systems such as 
> receptors for hololactoferrin, haemoglobin (HB), haemin (HM) and haeme 
> binding as well as adhesins to erythrocytes and epithelial cells 
> (Moreno-Brito et al. 2005, Ardalan et al. 2009). Iron plays a crucial role in 
> the pathogenesis of trichomonosis by increasing cytoadherence and modulating 
> resistance to complement lyses, ligation to the extracellular matrix and the 
> expression of proteases (Figueroa-Angulo et al. 2012). In agreement with this 
> role, the symptoms of trichomonosis worsen after menstruation. In addition, 
> iron also influences nucleotide hydrolysis in T. vaginalis (Tasca et al. 
> 2005, de Jesus et al. 2006). The extracellular concentrations of ATP and 
> adenosine can markedly increase under several conditions such as inflammation 
> and hypoxia as well as in the presence of pathogens (Robson et al. 2006, 
> Sansom 2012). In the extracellular medium, these nucleotides can act as 
> immunomodulators by triggering immunological effects. Extracellular ATP acts 
> as a proinflammatory immune-mediator by triggering multiple immunological 
> effects on cell types such as neutrophils, macrophages, dendritic cells and 
> lymphocytes (Bours et al. 2006). In this sense, ATP and adenosine 
> concentrations in the extracellular compartment are controlled by 
> ectoenzymes, including those of the nucleoside triphosphate 
> diphosphohydrolase (NTPDase) (EC: 3.1.4.1) family, which hydrolyze tri and 
> diphosphates and ecto-5’-nucleotidase (EC: 3.1.3.5), which hydrolyses 
> monophosphates (Zimmermann 2001). Considering that de novo nucleotide 
> synthesis is absent in T. vaginalis (Heyworth et al. 1982, 1984), this enzyme 
> cascade is important as a source of the precursor adenosine for purine 
> synthesis in the parasite (Munagala & Wang 2003). Extracellular nucleotide 
> metabolism has been characterised in several parasite species such as 
> Toxoplasma gondii, Schistosoma mansoni, Leishmania spp, Trypanosoma cruzi, 
> Acanthamoeba, Entamoeba histolytica, Giardia lamblia and fungi, Saccharomyces 
> cerevisiae, Cryptococcus neoformans, Candida parapsilosis and Candida 
> albicans (Sansom 2012). In T. vaginalis , NTPDase and ecto-5’-nucleotidase 
> activities have been characterised and they are involved in host-parasite 
> interactions by controlling ATP and adenosine levels (Matos et al. 2001, d, 
> de Jesus et al. 2002, Tasca et al. 2003). Considering that (i) iron plays a 
> crucial role in the pathogenesis of trichomonosis, (ii) ATP exerts a 
> proinflammatory effect in inflammation, (iii) adenosine is important to T. 
> vaginalis growth and acts as an antiinflammatory factor (Frasson et al. 2012) 
> and (iv) ectonucleotidases modulate the nucleotide levels at infection sites 
> (such as those observed in trichomonosis), the aim of this study was to 
> investigate the effect of iron on the extracellular nucleotide hydrolysis and 
> gene expression of T . vaginalis."}
> {code}
> Body has the type "text_en" configured in this way
> {code:java}
> <fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
>       <analyzer type="index">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory"
>                 ignoreCase="true"
>                 words="lang/stopwords_en.txt"
>             />
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.EnglishPossessiveFilterFactory"/>
>         <filter class="solr.KeywordMarkerFilterFactory" 
> protected="protwords.txt"/>
>         <filter class="solr.PorterStemFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory"
>                 ignoreCase="true"
>                 words="lang/stopwords_en.txt"
>         />
>         <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt"
>             ignoreCase="true" expand="true"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.EnglishPossessiveFilterFactory"/>
>         <filter class="solr.KeywordMarkerFilterFactory" 
> protected="protwords.txt"/>
>         <filter class="solr.PorterStemFilterFactory"/>
>       </analyzer>
>     </fieldType>
> {code}
> the two dictionary lines are in the file "synonyms.txt".
> If in a solr instance configured this way with those documents and I run the 
> following query
> {code:java}
> (body:"Cytosolic 5'-nucleotidase II" OR body:"EC 3.1.3.5") 
> {code}
> both documents are returned.
> Surprisingly, if I run the query
> {code:java}
> (body:"Cytosolic 5'-nucleotidase II") 
> {code}
> the second one is not returned.
> If I set debugQuery=true I see that the second line is expanded
> {code:java}
> A8K9N1,Glucosidase\, beta\, acid 3,Cytosolic,Glucosidase\, beta\, acid 
> 3,Cytosolic\, isoform CRA_b,cDNA FLJ78196\, highly similar to Homo sapiens 
> glucosidase\, beta\, acid 3,cytosolic,GBA3\, mRNA,cDNA\, FLJ93688\, Homo 
> sapiens glucosidase\, beta\, acid 3,cytosolic,GBA3\, mRNA
> {code}
> instead of the first
> {code:java}
> P49902,Cytosolic purine 5'-nucleotidase,EC 3.1.3.5,Cytosolic 5'-nucleotidase 
> II
> {code}
> The parsed query (given by debugquery) is
> {code:java}
> "parsedquery":"SpanNearQuery(spanNear([spanOr([body:a8k9n1, 
> spanNear([body:glucosidase,, body:beta,, body:acid, body:3], 0, true), 
> spanNear([body:cytosolic,, body:isoform, body:cra_b], 0, true), 
> spanNear([body:cdna, body:flj78196,, body:highli, body:similar, body:to, 
> body:homo, body:sapien, body:glucosidase,, body:beta,, body:acid, body:3], 0, 
> true), body:cytosol, spanNear([body:gba3,, body:mrna], 0, true), 
> spanNear([body:cdna,, body:flj93688,, body:homo, body:sapien, 
> body:glucosidase,, body:beta,, body:acid, body:3], 0, true), body:cytosol]), 
> body:5, body:nucleotidas, body:ii], 0, true))
> {code}
> If I remove the second line, no synonym is expanded
> {code:java}
>     "parsedquery":"PhraseQuery(body_unnamed:\"cytosol 5 nucleotidas ii\")",
> {code}
> I think this is related to the word "cytosolic" that appears as a synonim for 
> the second line. If I remove cytosolic as a synonim from the second line, 
> then again no synonym is expanded.
> Can you tell me why this happens? I thought that the first line should be 
> expanded since it has a multi-word synonym in it that match exactly the 
> phrase query.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to