[ https://issues.apache.org/jira/browse/SOLR-12731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16604228#comment-16604228 ]
Danilo Tomasoni commented on SOLR-12731: ---------------------------------------- I'm sorry you are right. I'll post my question in the user list. I'll also check on the admin/analysis tab. > SynonimGraphFilter expands wrong synonims > ----------------------------------------- > > Key: SOLR-12731 > URL: https://issues.apache.org/jira/browse/SOLR-12731 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: search > Affects Versions: 7.3.1 > Environment: Ubuntu 16.04.5 LTS, java version 1.8.0_181 > Reporter: Danilo Tomasoni > Priority: Major > Labels: synonyms > > Hello to all I have an issue related to synonimgraphfilter expanding the > wrong synonims for a phrase-term at query time. > I have a dictionary with the following lines > {code:java} > P49902,Cytosolic purine 5'-nucleotidase,EC 3.1.3.5,Cytosolic 5'-nucleotidase > II > A8K9N1,Glucosidase\, beta\, acid 3,Cytosolic,Glucosidase\, beta\, acid > 3,Cytosolic\, isoform CRA_b,cDNA FLJ78196\, highly similar to Homo sapiens > glucosidase\, beta\, acid 3,cytosolic,GBA3\, mRNA,cDNA\, FLJ93688\, Homo > sapiens glucosidase\, beta\, acid 3,cytosolic,GBA3\, mRNA > {code} > and two documents > {code:java} > {"body": "8. The method of claim 6 wherein said method inhibits at least one > 5′-nucleotidase chosen from cytosolic 5′-nucleotidase II (cN-II), cytosolic > 5′-nucleotidase IA (cN-IA), cytosolic 5′-nucleotidase IB (cN-IB), cytosolic > 5′-nucleotidase IMA (cN-IIIA), cytosolic 5′-nucleotidase NIB (cN-IIIB), > ecto-5′-nucleotidase (eN, CD73), cytosolic 5′(3′)-deoxynucleotidase (cdN) and > mitochondrial 5′(3′)-deoxynucleotidase (mdN)."} > {"body": "Trichomonosis caused by the flagellate protozoan Trichomonas > vaginalis represents the most prevalent nonviral sexually transmitted disease > worldwide (WHO-DRHR 2012). In women, the symptoms are cyclic and often worsen > around the menstruation period. In men, trichomonosis is largely asymptomatic > and these men are considered to be carriers of T. vaginalis (Petrin et al. > 1998). This infection has been associated with birth outcomes (Klebanoff et > al. 2001), infertility (Grodstein et al. 1993), cervical and prostate cancer > (Viikki et al. 2000, Sutcliffe et al. 2012) and pelvic inflammatory disease > (Cherpes et al. 2006). Importantly, T. vaginalis is a co-factor in human > immunodeficiency virus transmission and acquisition (Sorvillo et al. 2001, > Van Der Pol et al. 2008). Therefore, it is important to study the > host-parasite relationship to understand T. vaginalis infection and > pathogenesis. Colonisation of the mucosa by T. vaginalis is a complex > multi-step process that involves distinct mechanisms (Alderete et al. 2004). > The parasite interacts with mucin (Lehker & Sweeney 1999), adheres to vaginal > epithelial cells (VECs) in a process mediated by adhesion proteins (AP120, > AP65, AP51, AP33 and AP23) and undergoes dramatic morphological changes from > a pyriform to an amoeboid form (Engbring & Alderete 1998, Kucknoor et al. > 2005, Moreno-Brito et al. 2005). After adhesion to VECs, the synthesis and > gene expression of adhesins are increased (Kucknoor et al. 2005). These > mechanisms must be tightly regulated and iron plays a pivotal role in this > regulation. Iron is an essential element for all living organisms, from the > most primitive to the most complex, as a component of haeme, iron-sulphur > clusters and a variety of proteins. Iron is known to contribute to biological > functions such as DNA and RNA synthesis, oxygen transport and metabolic > reactions. T. vaginalis has developed multiple iron uptake systems such as > receptors for hololactoferrin, haemoglobin (HB), haemin (HM) and haeme > binding as well as adhesins to erythrocytes and epithelial cells > (Moreno-Brito et al. 2005, Ardalan et al. 2009). Iron plays a crucial role in > the pathogenesis of trichomonosis by increasing cytoadherence and modulating > resistance to complement lyses, ligation to the extracellular matrix and the > expression of proteases (Figueroa-Angulo et al. 2012). In agreement with this > role, the symptoms of trichomonosis worsen after menstruation. In addition, > iron also influences nucleotide hydrolysis in T. vaginalis (Tasca et al. > 2005, de Jesus et al. 2006). The extracellular concentrations of ATP and > adenosine can markedly increase under several conditions such as inflammation > and hypoxia as well as in the presence of pathogens (Robson et al. 2006, > Sansom 2012). In the extracellular medium, these nucleotides can act as > immunomodulators by triggering immunological effects. Extracellular ATP acts > as a proinflammatory immune-mediator by triggering multiple immunological > effects on cell types such as neutrophils, macrophages, dendritic cells and > lymphocytes (Bours et al. 2006). In this sense, ATP and adenosine > concentrations in the extracellular compartment are controlled by > ectoenzymes, including those of the nucleoside triphosphate > diphosphohydrolase (NTPDase) (EC: 3.1.4.1) family, which hydrolyze tri and > diphosphates and ecto-5’-nucleotidase (EC: 3.1.3.5), which hydrolyses > monophosphates (Zimmermann 2001). Considering that de novo nucleotide > synthesis is absent in T. vaginalis (Heyworth et al. 1982, 1984), this enzyme > cascade is important as a source of the precursor adenosine for purine > synthesis in the parasite (Munagala & Wang 2003). Extracellular nucleotide > metabolism has been characterised in several parasite species such as > Toxoplasma gondii, Schistosoma mansoni, Leishmania spp, Trypanosoma cruzi, > Acanthamoeba, Entamoeba histolytica, Giardia lamblia and fungi, Saccharomyces > cerevisiae, Cryptococcus neoformans, Candida parapsilosis and Candida > albicans (Sansom 2012). In T. vaginalis , NTPDase and ecto-5’-nucleotidase > activities have been characterised and they are involved in host-parasite > interactions by controlling ATP and adenosine levels (Matos et al. 2001, d, > de Jesus et al. 2002, Tasca et al. 2003). Considering that (i) iron plays a > crucial role in the pathogenesis of trichomonosis, (ii) ATP exerts a > proinflammatory effect in inflammation, (iii) adenosine is important to T. > vaginalis growth and acts as an antiinflammatory factor (Frasson et al. 2012) > and (iv) ectonucleotidases modulate the nucleotide levels at infection sites > (such as those observed in trichomonosis), the aim of this study was to > investigate the effect of iron on the extracellular nucleotide hydrolysis and > gene expression of T . vaginalis."} > {code} > Body has the type "text_en" configured in this way > {code:java} > <fieldType name="text_en" class="solr.TextField" positionIncrementGap="100"> > <analyzer type="index"> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.StopFilterFactory" > ignoreCase="true" > words="lang/stopwords_en.txt" > /> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.EnglishPossessiveFilterFactory"/> > <filter class="solr.KeywordMarkerFilterFactory" > protected="protwords.txt"/> > <filter class="solr.PorterStemFilterFactory"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.StopFilterFactory" > ignoreCase="true" > words="lang/stopwords_en.txt" > /> > <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" > ignoreCase="true" expand="true"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.EnglishPossessiveFilterFactory"/> > <filter class="solr.KeywordMarkerFilterFactory" > protected="protwords.txt"/> > <filter class="solr.PorterStemFilterFactory"/> > </analyzer> > </fieldType> > {code} > the two dictionary lines are in the file "synonyms.txt". > If in a solr instance configured this way with those documents and I run the > following query > {code:java} > (body:"Cytosolic 5'-nucleotidase II" OR body:"EC 3.1.3.5") > {code} > both documents are returned. > Surprisingly, if I run the query > {code:java} > (body:"Cytosolic 5'-nucleotidase II") > {code} > the second one is not returned. > If I set debugQuery=true I see that the second line is expanded > {code:java} > A8K9N1,Glucosidase\, beta\, acid 3,Cytosolic,Glucosidase\, beta\, acid > 3,Cytosolic\, isoform CRA_b,cDNA FLJ78196\, highly similar to Homo sapiens > glucosidase\, beta\, acid 3,cytosolic,GBA3\, mRNA,cDNA\, FLJ93688\, Homo > sapiens glucosidase\, beta\, acid 3,cytosolic,GBA3\, mRNA > {code} > instead of the first > {code:java} > P49902,Cytosolic purine 5'-nucleotidase,EC 3.1.3.5,Cytosolic 5'-nucleotidase > II > {code} > The parsed query (given by debugquery) is > {code:java} > "parsedquery":"SpanNearQuery(spanNear([spanOr([body:a8k9n1, > spanNear([body:glucosidase,, body:beta,, body:acid, body:3], 0, true), > spanNear([body:cytosolic,, body:isoform, body:cra_b], 0, true), > spanNear([body:cdna, body:flj78196,, body:highli, body:similar, body:to, > body:homo, body:sapien, body:glucosidase,, body:beta,, body:acid, body:3], 0, > true), body:cytosol, spanNear([body:gba3,, body:mrna], 0, true), > spanNear([body:cdna,, body:flj93688,, body:homo, body:sapien, > body:glucosidase,, body:beta,, body:acid, body:3], 0, true), body:cytosol]), > body:5, body:nucleotidas, body:ii], 0, true)) > {code} > If I remove the second line, no synonym is expanded > {code:java} > "parsedquery":"PhraseQuery(body_unnamed:\"cytosol 5 nucleotidas ii\")", > {code} > I think this is related to the word "cytosolic" that appears as a synonim for > the second line. If I remove cytosolic as a synonim from the second line, > then again no synonym is expanded. > Can you tell me why this happens? I thought that the first line should be > expanded since it has a multi-word synonym in it that match exactly the > phrase query. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org