Hi, Note: this is just a prospective idea I'd like to discuss. Even if it's a good idea, it's definitely 5.0 material.
Those who have used Solr and are familiar with the Solr schema have already seen the ability to use different analyzer for indexing and querying. It's usually useful when you use analyzers which returns several tokens for a given token: the QueryParser usually can't build the correct query with these analyzers. To take an example from my current work on HSEARCH-917 (soon to come \o/), I have the following case. From i-pod , the analyzer builds ipod i pod i-pod. ipod and i-pod aren't the issue here but the fact that i pod is on two tokens makes the QueryParser build an incorrect query (even if I use the Lucene 4.4 version which is a little bit smarter about these cases and at least make the i-pod ipod case work correctly). The fact is that if the analyzer used at indexing has correctly indexed all the tokens, I don't need to expand the terms at querying and it should be sufficient to use a simple analyzer to lowercase the string and remove the accents. Solr introduced this feature a long time ago (it was already there in the good old times of 1.3) and I'm wondering if we shouldn't introduce it in Hibernate Search too. As for the implementation, I was thinking about adding an attribute queryAnalyzer to the @Field annotation. I was also wondering if we shouldn't add the ability to define an Analyzer for wildcard queries (Lucene introduced recently an AnalyzingQueryParser to do something like that). And maybe, in this case, it would be a good idea to centralize the configuration with types as it's done in Solr? Usually, the three analyzers definitions would come together. As for my particular needs, most of my full text fields would be analyzed like this: indexing: @AnalyzerDef(name = HibernateSearchAnalyzer.TEXT, tokenizer = @TokenizerDef(factory = WhitespaceTokenizerFactory.class), filters = { @TokenFilterDef(factory = ASCIIFoldingFilterFactory.class), @TokenFilterDef(factory = WordDelimiterFilterFactory.class, params = { @org.hibernate.search.annotations.Parameter(name = "generateWordParts", value = "1"), @org.hibernate.search.annotations.Parameter(name = "generateNumberParts", value = "1"), @org.hibernate.search.annotations.Parameter(name = "catenateWords", value = "1"), @org.hibernate.search.annotations.Parameter(name = "catenateNumbers", value = "0"), @org.hibernate.search.annotations.Parameter(name = "catenateAll", value = "0"), @org.hibernate.search.annotations.Parameter(name = "splitOnCaseChange", value = "0"), @org.hibernate.search.annotations.Parameter(name = "splitOnNumerics", value = "0"), @org.hibernate.search.annotations.Parameter(name = "preserveOriginal", value = "1") } ), @TokenFilterDef(factory = LowerCaseFilterFactory.class) } ), querying: @AnalyzerDef(name = HibernateSearchAnalyzer.TEXT, tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class), filters = { @TokenFilterDef(factory = ASCIIFoldingFilterFactory.class), @TokenFilterDef(factory = LowerCaseFilterFactory.class) } ), wildcard: @AnalyzerDef(name = HibernateSearchAnalyzer.TEXT, tokenizer = @TokenizerDef(factory = WhitespaceTokenizerFactory.class), filters = { @TokenFilterDef(factory = ASCIIFoldingFilterFactory.class), @TokenFilterDef(factory = LowerCaseFilterFactory.class) } ), I could contribute time to work on this if we can agree on the way to pursue this idea. Thanks for your feedback. -- Guillaume _______________________________________________ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev