[
https://issues.apache.org/jira/browse/SOLR-12731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Danilo Tomasoni updated SOLR-12731:
-----------------------------------
Environment: Ubuntu 16.04.5 LTS, java version 1.8.0_181 (was: Ubuntu
16.04.5 LTS)
> SynonimGraphFilter expands wrong synonims
> -----------------------------------------
>
> Key: SOLR-12731
> URL: https://issues.apache.org/jira/browse/SOLR-12731
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: search
> Affects Versions: 7.3.1
> Environment: Ubuntu 16.04.5 LTS, java version 1.8.0_181
> Reporter: Danilo Tomasoni
> Priority: Major
> Labels: synonyms
>
> Hello to all I have an issue related to synonimgraphfilter expanding the
> wrong synonims for a phrase-term at query time.
> I have a dictionary with the following lines
> {code:java}
> P49902,Cytosolic purine 5'-nucleotidase,EC 3.1.3.5,Cytosolic 5'-nucleotidase
> II
> A8K9N1,Glucosidase\, beta\, acid 3,Cytosolic,Glucosidase\, beta\, acid
> 3,Cytosolic\, isoform CRA_b,cDNA FLJ78196\, highly similar to Homo sapiens
> glucosidase\, beta\, acid 3,cytosolic,GBA3\, mRNA,cDNA\, FLJ93688\, Homo
> sapiens glucosidase\, beta\, acid 3,cytosolic,GBA3\, mRNA
> {code}
> and two documents
> {code:java}
> {"body": "8. The method of claim 6 wherein said method inhibits at least one
> 5′-nucleotidase chosen from cytosolic 5′-nucleotidase II (cN-II), cytosolic
> 5′-nucleotidase IA (cN-IA), cytosolic 5′-nucleotidase IB (cN-IB), cytosolic
> 5′-nucleotidase IMA (cN-IIIA), cytosolic 5′-nucleotidase NIB (cN-IIIB),
> ecto-5′-nucleotidase (eN, CD73), cytosolic 5′(3′)-deoxynucleotidase (cdN) and
> mitochondrial 5′(3′)-deoxynucleotidase (mdN)."}
> {"body": "Trichomonosis caused by the flagellate protozoan Trichomonas
> vaginalis represents the most prevalent nonviral sexually transmitted disease
> worldwide (WHO-DRHR 2012). In women, the symptoms are cyclic and often worsen
> around the menstruation period. In men, trichomonosis is largely asymptomatic
> and these men are considered to be carriers of T. vaginalis (Petrin et al.
> 1998). This infection has been associated with birth outcomes (Klebanoff et
> al. 2001), infertility (Grodstein et al. 1993), cervical and prostate cancer
> (Viikki et al. 2000, Sutcliffe et al. 2012) and pelvic inflammatory disease
> (Cherpes et al. 2006). Importantly, T. vaginalis is a co-factor in human
> immunodeficiency virus transmission and acquisition (Sorvillo et al. 2001,
> Van Der Pol et al. 2008). Therefore, it is important to study the
> host-parasite relationship to understand T. vaginalis infection and
> pathogenesis. Colonisation of the mucosa by T. vaginalis is a complex
> multi-step process that involves distinct mechanisms (Alderete et al. 2004).
> The parasite interacts with mucin (Lehker & Sweeney 1999), adheres to vaginal
> epithelial cells (VECs) in a process mediated by adhesion proteins (AP120,
> AP65, AP51, AP33 and AP23) and undergoes dramatic morphological changes from
> a pyriform to an amoeboid form (Engbring & Alderete 1998, Kucknoor et al.
> 2005, Moreno-Brito et al. 2005). After adhesion to VECs, the synthesis and
> gene expression of adhesins are increased (Kucknoor et al. 2005). These
> mechanisms must be tightly regulated and iron plays a pivotal role in this
> regulation. Iron is an essential element for all living organisms, from the
> most primitive to the most complex, as a component of haeme, iron-sulphur
> clusters and a variety of proteins. Iron is known to contribute to biological
> functions such as DNA and RNA synthesis, oxygen transport and metabolic
> reactions. T. vaginalis has developed multiple iron uptake systems such as
> receptors for hololactoferrin, haemoglobin (HB), haemin (HM) and haeme
> binding as well as adhesins to erythrocytes and epithelial cells
> (Moreno-Brito et al. 2005, Ardalan et al. 2009). Iron plays a crucial role in
> the pathogenesis of trichomonosis by increasing cytoadherence and modulating
> resistance to complement lyses, ligation to the extracellular matrix and the
> expression of proteases (Figueroa-Angulo et al. 2012). In agreement with this
> role, the symptoms of trichomonosis worsen after menstruation. In addition,
> iron also influences nucleotide hydrolysis in T. vaginalis (Tasca et al.
> 2005, de Jesus et al. 2006). The extracellular concentrations of ATP and
> adenosine can markedly increase under several conditions such as inflammation
> and hypoxia as well as in the presence of pathogens (Robson et al. 2006,
> Sansom 2012). In the extracellular medium, these nucleotides can act as
> immunomodulators by triggering immunological effects. Extracellular ATP acts
> as a proinflammatory immune-mediator by triggering multiple immunological
> effects on cell types such as neutrophils, macrophages, dendritic cells and
> lymphocytes (Bours et al. 2006). In this sense, ATP and adenosine
> concentrations in the extracellular compartment are controlled by
> ectoenzymes, including those of the nucleoside triphosphate
> diphosphohydrolase (NTPDase) (EC: 3.1.4.1) family, which hydrolyze tri and
> diphosphates and ecto-5’-nucleotidase (EC: 3.1.3.5), which hydrolyses
> monophosphates (Zimmermann 2001). Considering that de novo nucleotide
> synthesis is absent in T. vaginalis (Heyworth et al. 1982, 1984), this enzyme
> cascade is important as a source of the precursor adenosine for purine
> synthesis in the parasite (Munagala & Wang 2003). Extracellular nucleotide
> metabolism has been characterised in several parasite species such as
> Toxoplasma gondii, Schistosoma mansoni, Leishmania spp, Trypanosoma cruzi,
> Acanthamoeba, Entamoeba histolytica, Giardia lamblia and fungi, Saccharomyces
> cerevisiae, Cryptococcus neoformans, Candida parapsilosis and Candida
> albicans (Sansom 2012). In T. vaginalis , NTPDase and ecto-5’-nucleotidase
> activities have been characterised and they are involved in host-parasite
> interactions by controlling ATP and adenosine levels (Matos et al. 2001, d,
> de Jesus et al. 2002, Tasca et al. 2003). Considering that (i) iron plays a
> crucial role in the pathogenesis of trichomonosis, (ii) ATP exerts a
> proinflammatory effect in inflammation, (iii) adenosine is important to T.
> vaginalis growth and acts as an antiinflammatory factor (Frasson et al. 2012)
> and (iv) ectonucleotidases modulate the nucleotide levels at infection sites
> (such as those observed in trichomonosis), the aim of this study was to
> investigate the effect of iron on the extracellular nucleotide hydrolysis and
> gene expression of T . vaginalis."}
> {code}
> Body has the type "text_en" configured in this way
> {code:java}
> <fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
> <analyzer type="index">
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.StopFilterFactory"
> ignoreCase="true"
> words="lang/stopwords_en.txt"
> />
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.EnglishPossessiveFilterFactory"/>
> <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
> <filter class="solr.PorterStemFilterFactory"/>
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.StopFilterFactory"
> ignoreCase="true"
> words="lang/stopwords_en.txt"
> />
> <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.EnglishPossessiveFilterFactory"/>
> <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
> <filter class="solr.PorterStemFilterFactory"/>
> </analyzer>
> </fieldType>
> {code}
> the two dictionary lines are in the file "synonyms.txt".
> If in a solr instance configured this way with those documents and I run the
> following query
> {code:java}
> (body:"Cytosolic 5'-nucleotidase II" OR body:"EC 3.1.3.5")
> {code}
> both documents are returned.
> Surprisingly, if I run the query
> {code:java}
> (body:"Cytosolic 5'-nucleotidase II")
> {code}
> the second one is not returned.
> If I set debugQuery=true I see that the second line is expanded
> {code:java}
> A8K9N1,Glucosidase\, beta\, acid 3,Cytosolic,Glucosidase\, beta\, acid
> 3,Cytosolic\, isoform CRA_b,cDNA FLJ78196\, highly similar to Homo sapiens
> glucosidase\, beta\, acid 3,cytosolic,GBA3\, mRNA,cDNA\, FLJ93688\, Homo
> sapiens glucosidase\, beta\, acid 3,cytosolic,GBA3\, mRNA
> {code}
> instead of the first
> {code:java}
> P49902,Cytosolic purine 5'-nucleotidase,EC 3.1.3.5,Cytosolic 5'-nucleotidase
> II
> {code}
> The parsed query (given by debugquery) is
> {code:java}
> "parsedquery":"SpanNearQuery(spanNear([spanOr([body:a8k9n1,
> spanNear([body:glucosidase,, body:beta,, body:acid, body:3], 0, true),
> spanNear([body:cytosolic,, body:isoform, body:cra_b], 0, true),
> spanNear([body:cdna, body:flj78196,, body:highli, body:similar, body:to,
> body:homo, body:sapien, body:glucosidase,, body:beta,, body:acid, body:3], 0,
> true), body:cytosol, spanNear([body:gba3,, body:mrna], 0, true),
> spanNear([body:cdna,, body:flj93688,, body:homo, body:sapien,
> body:glucosidase,, body:beta,, body:acid, body:3], 0, true), body:cytosol]),
> body:5, body:nucleotidas, body:ii], 0, true))
> {code}
> If I remove the second line, no synonym is expanded
> {code:java}
> "parsedquery":"PhraseQuery(body_unnamed:\"cytosol 5 nucleotidas ii\")",
> {code}
> I think this is related to the word "cytosolic" that appears as a synonim for
> the second line. If I remove cytosolic as a synonim from the second line,
> then again no synonym is expanded.
> Can you tell me why this happens? I thought that the first line should be
> expanded since it has a multi-word synonym in it that match exactly the
> phrase query.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]