Thanks Webster, I created https://issues.apache.org/jira/browse/SOLR-11955 to work on this.
-- Steve www.lucidworks.com > On Feb 6, 2018, at 2:47 PM, Webster Homer <webster.ho...@sial.com> wrote: > > I noticed that in some of the current example schemas that are shipped with > Solr, there is a fieldtype, text_en_splitting, that feeds the output > of SynonymGraphFilterFactory into WordDelimiterGraphFilterFactory. So if > this isn't supported, the example should probably be updated or removed. > > On Mon, Feb 5, 2018 at 10:27 AM, Steve Rowe <sar...@gmail.com> wrote: > >> Hi Александр, >> >>> On Feb 5, 2018, at 11:19 AM, Shawn Heisey <apa...@elyograg.org> wrote: >>> >>> There should be no problem with using them together. >> >> I believe Shawn is wrong. >> >> From <http://lucene.apache.org/core/7_2_0/analyzers-common/ >> org/apache/lucene/analysis/synonym/SynonymGraphFilter.html>: >> >>> NOTE: this cannot consume an incoming graph; results will be undefined. >> >> Unfortunately, the ref guide entry for Synonym Graph Filter < >> https://lucene.apache.org/solr/guide/7_2/filter-descriptions.html#synonym- >> graph-filter> doesn’t include a warning about this, but it should, like >> the warning on Word Delimiter Graph Filter <https://lucene.apache.org/ >> solr/guide/7_2/filter-descriptions.html#word-delimiter-graph-filter>: >> >>> Note: although this filter produces correct token graphs, it cannot >> consume an input token graph correctly. >> >> (I’ve just committed a change to the ref guide source to add this also on >> the Synonym Graph Filter and Managed Synonym Graph Filter entries, to be >> included in the ref guide for Solr 7.3.) >> >> In short, the combination of the two filters is not supported, because >> WDGF produces a token graph, which SGF cannot correctly interpret. >> >> Other filters also have this issue, see e.g. <https://issues.apache.org/ >> jira/browse/LUCENE-3475> for ShingleFilter; this issue has gotten some >> attention recently, and hopefully it will inspire fixes elsewhere. >> >> Patches welcome! >> >> -- >> Steve >> www.lucidworks.com >> >> >>> On Feb 5, 2018, at 11:19 AM, Shawn Heisey <apa...@elyograg.org> wrote: >>> >>> On 2/5/2018 3:55 AM, Александр Шестак wrote: >>>> >>>> Hi, I have misunderstanding about usage of SynonymGraphFilterFactory >>>> and WordDelimiterGraphFilterFactory. Can they be used together? >>>> >>> >>> There should be no problem with using them together. But it is always >>> possible that the behavior will surprise you, while working 100% as >>> designed. >>> >>>> I have solr type configured in next way >>>> >>>> <fieldtype name="fulltext_en" class="solr.TextField" >>>> autoGeneratePhraseQueries="true"> >>>> <analyzer type="index"> >>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >>>> <filter class="solr.WordDelimiterGraphFilterFactory" >>>> generateWordParts="1" generateNumberParts="1" >>>> splitOnNumerics="1" >>>> catenateWords="1" catenateNumbers="1" catenateAll="0" >>>> preserveOriginal="1" protected="protwords_en.txt"/> >>>> <filter class="solr.FlattenGraphFilterFactory"/> >>>> </analyzer> >>>> <analyzer type="query"> >>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >>>> <filter class="solr.WordDelimiterGraphFilterFactory" >>>> generateWordParts="1" generateNumberParts="1" >>>> splitOnNumerics="1" >>>> catenateWords="0" catenateNumbers="0" catenateAll="0" >>>> preserveOriginal="1" protected="protwords_en.txt"/> >>>> <filter class="solr.LowerCaseFilterFactory"/> >>>> <filter class="solr.SynonymGraphFilterFactory" >>>> synonyms="synonyms_en.txt" ignoreCase="true" expand="true"/> >>>> </analyzer> >>>> </fieldtype> >>>> >>>> So on query time it uses SynonymGraphFilterFactory after >>>> WordDelimiterGraphFilterFactory. >>>> Synonyms are configured in next way: >>>> b=>b,boron >>>> 2=>ii,2 >>>> >>>> Query in solr analysis tool looks so. It is shown that terms after SGF >>>> have positions 3 and 4. Is it correct? I thought that they should had >>>> 1 and 2 positions. >>>> >>> >>> What matters is the *relative* positions. The exact position number >>> doesn't matter much. Something new that the Graph implementations use >>> is the position length. That feature is necessary for multi-term >>> synonyms to function correctly in phrase queries. >>> >>> In your analysis screenshot, WDGF creates three tokens. The two tokens >>> created by splitting the input are at positions 1 and 2, which I think >>> is 100% as expected. It also sets the positionLength of the first term >>> to 2, probably because it has split that term into 2 additional terms. >>> >>> Then the SGF takes those last two terms and expands them. Each of the >>> synonyms is at the same position as the original term, and the relative >>> positions of the two synonym pairs have not changed -- the second one is >>> still one higher than the first. I think the reason that SGF moves the >>> positions two higher is because the positionLength on the "b2" term is >>> 2, previously set by WDGF. Someone with more knowledge about the Graph >>> implementations may have to speak up as to whether this behavior is >> correct. >>> >>> Because the relative positions of the split terms don't change when SGF >>> runs, I think this is probably working as designed. >>> >>> Thanks, >>> Shawn >> >> > > -- > > > This message and any attachment are confidential and may be privileged or > otherwise protected from disclosure. If you are not the intended recipient, > you must not copy this message or attachment or disclose the contents to > any other person. If you have received this transmission in error, please > notify the sender immediately and delete the message and any attachment > from your system. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not accept liability for any omissions or errors in this > message which may arise as a result of E-Mail-transmission or for damages > resulting from any unauthorized changes of the content of this message and > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not guarantee that this message is free of viruses and does > not accept liability for any damages caused by any virus transmitted > therewith. > > Click http://www.emdgroup.com/disclaimer to access the German, French, > Spanish and Portuguese versions of this disclaimer.