[ https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907339#comment-13907339 ]
Robert Muir commented on LUCENE-5205: ------------------------------------- {quote} Code duplication. The biggest offenders are in test (I think, let me know if you disagree): {quote} I guess i wasn't even thinking about test code, i just see code like this method in AnalyzingQueryBase (as an example): {code} protected BytesRef analyzeMultitermTermParseEx(String field, String part) throws ParseException { {code} I know this probably exists elsewhere, i swear there is something in QPBase doing this for range queries. :) As far as test code: {quote} 1) TestSpanQPBasedonQPTestBase...I can try to refactor this to extend QPTestBase, but that will require some reworking of QPTestBase as, and I didn't want to touch that (hence the duplication). It would also help to add a getQuery() to SpanMultitermQueryWrapper to test for equality...again, I didn't want to touch anything outside of the parser at the cost of duplication. {quote} I worked on this last night. Basically i split up QPTestBase into two parts: 1. test harness, test analyzers, helper assertions, etc (QPTestCase) 2. actual concrete test methods for historical lucene behavior. I fixed TestSpanQPBasedonQPTestBase to extend #1, I'm pretty happy with it? It removed all the duplicate test analyzers and assert methods and so on. Let me know if you have any concerns with what i did there. {quote} It would also help to add a getQuery() to SpanMultitermQueryWrapper to test for equality...again, I didn't want to touch anything outside of the parser at the cost of duplication. {quote} Hmm, i guess i'm of the opposite mentality. I can see your point about not touching existing code, as it makes this contribution a 'standalone' change, but personally i'd rather us fix stuff like this! That being said, i did in fact recently add such a method while working on an unrelated issue (LUCENE-5415): {code} /** Returns the wrapped query */ public Query getWrappedQuery() { return query; } {code} Does this work? Want to upload a patch that uses this? {quote} 3) TestComplexPhraseQuery. Should be straightforward to extend the original, but will need to make checkMatches public so that I can override it. I'll also have to move the tests with slightly different syntax into a different test, but that's easy and would help declutter. {quote} Ok, this sounds good. {quote} There's other code duplication with AnalyzingQueryParser...should we break that functionality out into a helper class? {quote} Sounds like a good idea, i havent looked at the two, but seems worthwhile. {quote} Y, I don't like the reinit at all. The reason that's there was so that I could extend QueryParserBase, but I'm not sure that that decision buys much anymore. As I remember, it buys date parsing in range queries (which I'm now not sure I actually want) and addBoolean; there may be more, but I'm not sure there is. It would clean up a fair bit of code if I implement CommonQueryParserConfiguration instead of extending QueryParserBase. I'd still have to leave in some things that don't make sense for the SpanQueryParser, though: lowerCaseExpandedTerms, enablePositionIncrements. Another option would be to abandon CQPC, but I wanted this parser to at least implement that interface. Let me know what makes sense. {quote} I think the current base class is fine. I guess my question is, who is calling reinit :) {quote} As for the public base classes, y, those can go private for now. I made them public in case anyone wanted to extend them, but, as you point out, then I really ought to add javadocs and treat them as if they were public (which they are!). {quote} sounds good, maybe you want to upload a patch making the appropriate things pkg-private? We can always open things back up later, if we need. > [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to > classic QueryParser > ----------------------------------------------------------------------------------------------- > > Key: LUCENE-5205 > URL: https://issues.apache.org/jira/browse/LUCENE-5205 > Project: Lucene - Core > Issue Type: Improvement > Components: core/queryparser > Reporter: Tim Allison > Labels: patch > Fix For: 4.7 > > Attachments: LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, > LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, > SpanQueryParser_v1.patch.gz, patch.txt > > > This parser extends QueryParserBase and includes functionality from: > * Classic QueryParser: most of its syntax > * SurroundQueryParser: recursive parsing for "near" and "not" clauses. > * ComplexPhraseQueryParser: can handle "near" queries that include multiterms > (wildcard, fuzzy, regex, prefix), > * AnalyzingQueryParser: has an option to analyze multiterms. > At a high level, there's a first pass BooleanQuery/field parser and then a > span query parser handles all terminal nodes and phrases. > Same as classic syntax: > * term: test > * fuzzy: roam~0.8, roam~2 > * wildcard: te?t, test*, t*st > * regex: /\[mb\]oat/ > * phrase: "jakarta apache" > * phrase with slop: "jakarta apache"~3 > * default "or" clause: jakarta apache > * grouping "or" clause: (jakarta apache) > * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta > * multiple fields: title:lucene author:hatcher > > Main additions in SpanQueryParser syntax vs. classic syntax: > * Can require "in order" for phrases with slop with the \~> operator: > "jakarta apache"\~>3 > * Can specify "not near": "fever bieber"!\~3,10 :: > find "fever" but not if "bieber" appears within 3 words before or 10 > words after it. > * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta > apache\]~3 lucene\]\~>4 :: > find "jakarta" within 3 words of "apache", and that hit has to be within > four words before "lucene" > * Can also use \[\] for single level phrasal queries instead of " as in: > \[jakarta apache\] > * Can use "or grouping" clauses in phrasal queries: "apache (lucene solr)"\~3 > :: find "apache" and then either "lucene" or "solr" within three words. > * Can use multiterms in phrasal queries: "jakarta\~1 ap*che"\~2 > * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ > /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like "jakarta" within two > words of "ap*che" and that hit has to be within ten words of something like > "solr" or that "lucene" regex. > * Can require at least x number of hits at boolean level: "apache AND (lucene > solr tika)~2 > * Can use negative only query: -jakarta :: Find all docs that don't contain > "jakarta" > * Can use an edit distance > 2 for fuzzy query via SlowFuzzyQuery (beware of > potential performance issues!). > Trivial additions: > * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, > prefix =2) > * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance > <=2: (jakarta~1 (OSA) vs jakarta~>1(Levenshtein) > This parser can be very useful for concordance tasks (see also LUCENE-5317 > and LUCENE-5318) and for analytical search. > Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery. > Most of the documentation is in the javadoc for SpanQueryParser. > Any and all feedback is welcome. Thank you. -- This message was sent by Atlassian JIRA (v6.1.5#6160) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org