Regarding github, you can follow the help at http://www.elasticsearch.org/contributing-to-elasticsearch/
but if you feel more comfortable, you can also just post a diff/patch somewhere (preferably against HEAD) with your changes/additions. This would be enough at least for me to have a first look. Jörg On Wed, Nov 19, 2014 at 1:30 PM, <[email protected]> wrote: > hi jörg > > thank you for your quick response! > > glad to hear from you that you agree with me that wildcard analysis could > be further improved. (concerning prefix support it's already great!) > i already started to look around for other solutions like writing a plugin > to use a custom queryparser or sth., but presumed i do not misinterpret > your answer > improving the getPossiblyAnalyzedWildcardQuery wildcard method does not > sound completely absurd to you resp. is not the place/wrong approach > (you also could have told me that i need to write a plugin or somehow > plugin/register kind of queryparser subclass, or some other reasons why > this method is written the way it is) > > so for the moment i will stick to/with "my" improved > getPossiblyAnalyzedWildcardQuery method and do further testing with more > data resp. larger indices etc. to see how it performs, (as i initially > mentioned i need to "generate" even more wildcards, also leading ones to > produce the desired results/matches) ... > > as soon as i'm convinced of the "improvement" i'll clean up the code and > try to do a fork so you could have a look at it > (PS. i need to familiarize mysef a bit more with git first, since i'm > still one of the oldschool svn guys ;-), but i think somehow i will be able > to do a fork / commit? )... > > it would like helping to further improve such a great software/product > like elasticsearch > > cheers marco > > > Am Mittwoch, 19. November 2014 09:56:43 UTC+1 schrieb [email protected]: > >> hi >> >> i have text/email addresses indexed with the standard analyzer. >> >> e.g. >> >> "[email protected]" that results in two tokens being in the index: >> >> [marco.kamm] and [brain.net] >> >> i want to search using query_string query and wildcards like: >> >> { >> fields:["contact_email"], >> "query" : { >> "query_string" : { >> "query" : "(contact_email:(marco.*@brain.net))", >> "default_operator" : "and", >> "analyze_wildcard": true >> } >> } >> } >> >> from my past working-experience with lucene i know that wildcards queries >> are kind of problematic cause they're not analyzed by default. >> (to workaround this behaviour i wrote a custom parser that prepares the >> query string depending on the specific field analyzer in prior before >> passing it to the lucene query parser) >> >> at first when i noticed the analyze_wildcard parameter/option i thought >> great/cool! i no longer need my "custom magic parser ,-)", elasticsearch >> provides built-in support for my problems ... >> >> when testing the "analyze_wildcard" behaviour with "pure" prefix queries >> like "marco.kamm@brain.*" it worked like a charm! resp. did the same >> thing i tried to achive with my >> custom "pre-parser". the query was "transformed" to sth. like >> "contact_email:marco.kamm OR contact_email:brain*" that perfectly matches >> what's in the index ... >> >> but unfortunately testing with "real" wildcard queries like the above " >> marco.*@brain.net" is giving me a query that won't find anything in my >> situation cause it will be >> turned into: "contact_email:marco*brain.net" and there's no single! >> token in my index that will match (although it gets analyzed). to find some >> results the query rather would have >> to be turned int sth. like: "contact_email:marco* AND contact_email: >> brain.net" or "contact_email:marco* AND contact_email:*brain.net" (if >> the user search for "marco.*.net") ... >> >> by looking at the source code of >> org.apache.lucene.queryparser.classic.MapperQueryParser.java >> (i actually started to dive into the source code by chasing down the >> "rather small" already mentioned issue >> with the harcoded boolean.clause OR operator here: https://github.com/ >> elasticsearch/elasticsearch/issues/2183) i realized that there are two >> different methods for analyzing pure wildcard and prefix queries >> (getPossiblyAnalyzedPrefixQuery resp getPossiblyAnalyzedWildcardQuery, i >> first expected this cases to be handled by the same code) and that's why >> i'm getting the perfect results for prefix queries and sadly not working >> ones for >> pure wildcard ones ... >> >> i started to experiment/fiddle with the getPossiblyAnalyzedWildcardQuery >> method by rewriting it in a way to work more like the >> getPossiblyAnalyzedPrefixQuery method resp. >> instead of generating only a single one wildcardquery object with the >> analyzed string, it builds a boolean query including several wildcardquery >> objects (splitting on */?)... >> >> my first tests showed that this would work quite well! ... >> >> >> >> now my questions: >> >> what do you think about this "approach"? >> >> do you see any serious drawbacks, besides performance >> i know that using even more wildcards will drastically reduce the search >> performance >> but better trying to finally serve some results after quite long time >> than finding nothing at all? >> >> (i also know that lucene is not built/optimized for wildcards queries and >> some cases could be resolved using different analyzers (ngram, reverse), >> multiple fields etc. >> but users are used to, and there could be usecases where such wildcard >> queries could make sense >> resp. where it's not practicable to use keyword analyzers that wont >> suffer from such problems e.g for longer text etc)! >> >> do you plan to further enhance the getPossiblyAnalyzedWildcardQuery >> method (although it is stated in the docs that this method does best >> efforts)? >> >> (btw. do you also plan to fix the OR operator issue, could be rather >> simple just use the specified parameter) >> >> if my approach is legit and given that i dont like having to modify the >> elasticsearch "core" code and rebuild/adapt it with every new release >> how/where else >> could i implement such an extension? do i have to write a custom >> queryparser (maybe extends MapperQueryParser) and build my own plugin / >> rest endpoint ... >> >> (i recently found out that there's also a lucene class called >> AnalyzingQueryParser maybe i should have used this one instead of writing >> my own magic-parser, is/could this be used somehow in elasticsearch? >> >> is there a possibility to / should i write a feature request for even >> more best effor on analyzing wildcard queries. PS i know the wildcard >> handling issue could be a pain in the a**, and maybe could only be solved >> on a best efford basis?. but i'm somehow forced to mess around with this >> cause i have to (want!) to port my old lucene stuff to elasticsearch >> (except this issue i think elasticsearch is a great product and i like to >> work with it. this problem lies in the nature of inverted indices and >> wildcards resp. analyzers) >> >> >> sorry for the long maybe confusing mail, but i need your expert >> thoughts/advices about this wildcard issue >> >> thank you >> regards marco >> >> > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/556edd4a-5ced-4953-9f4d-ff53fb2bcca6%40googlegroups.com > <https://groups.google.com/d/msgid/elasticsearch/556edd4a-5ced-4953-9f4d-ff53fb2bcca6%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFfXx_P8B2XrYw3WFXGMHDQ9N9bDYTZaEi504YnNUoEBw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
