On 23 June 2014 at 08:54:57, Anže Starič ([email protected]) wrote:

On Mon, Jun 23, 2014 at 2:12 AM, Antonia Horincar 
<[email protected]> wrote: 
> How about the case when the admin switches from one search backend to 
> another, shouldn’t the appropriate index be populated with all existing 
> resources in BH? This is mainly why I was thinking I need to implement my own 
> admin commands. 

This can also be done using bhsearch admin command. When admin decides 
to switch search backends, he needs to modify search_backend setting 
in trac.ini and run trac-admin bhsearch upgrade. This looks like a 
reasonable workflow to me. We could extend the 
environment_needs_upgrade method in BlodhoundSearchApi to monitor for 
backend change and request an environment upgrade when it does, but I 
do not think that this is a priority. 
I didn’t know that we could achieve this using bhsearch upgrade, that’s why I 
thought I needed to implement another admin command. But that worked perfectly. 



> I am currently working on displaying the retrieved Solr results in the 
> interface. The results are currently shown in the interface, but I am working 
> on applying highlighting and faceting. 
> 
> Also, I have a question regarding the meta keyword parsers. How are the 
> DocTypeMetaKeywordParser and the other keyword parsers from 
> bhsearch.query_parser used? 

MetaKeywordParsers are just match_and_replace rules for words 
beginning with a $. They are used in MetaKeywordPlugin which could be 
summarized as: find all words that begin with a $ using a regexp 
match, pass the word to MetaKeywordParsers, if any of them knows the 
keyword it will return some text, which you use to replace the keyword 
string in the original query. 

> I understood in general what the DefaultQueryParser does, however I’m not 
> sure I get how parser plugins are used in Whoosh. I would like to understand 
> the query_parser module better because I used the DefaultQueryParser for 
> parsing the query. I’m not sure if this is a good idea because basically it 
> uses Whoosh for parsing the query, but it was easier for the moment. Should I 
> try to implement my own query parser for Solr? 

If I understand correctly, solrs expects the query as a string, which 
it then parses internally. If it is not too hard to reconstruct the 
query from whoosh, I would use the existing query parser, so you can 
reuse the existing security processing and meta keyword parsing. 
It’s actually easier to reconstruct the query from whoosh (by accessing 
attributes of query objects created by Whoosh), because otherwise I would have 
to implement a parser to correctly parse a raw query, which in my opinion is 
much more difficult to achieve. 



If you want to know more about how whoosh parses queries, here is a 
short description. Whoosh parses queries with a bunch of match and 
filter plugins. Match plugins try to match the word to a predefined 
regular expression and emit a node class upon match. Filters then 
modify the generated list of nodes to group nodes based on operator 
priority, manage terms without defined fields etc. 

MetaKeywordPlugin is both a matcher and a filter. It matches all words 
starting with a $ and passes them to MetaKeywordParsers. If a 
MetaKeywordParsers understands a keyword, it expands it into a new 
string ($ticket -> type:ticket), which is again parsed by whoosh. 
Parsed representation of the expanded meta keyword is stored inside a 
MetaKeywordNode. In the filter phase, MetaKeywordPlugin "flattens" the 
meta keywords (replaces meta keyword nodes with the parsed 
representation of the expanded text. 
Thanks for the description, this really helped me understand more about the 
Whoosh query parser.




Anze 

[1] 
https://github.com/apache/bloodhound/blob/trunk/bloodhound_search/bhsearch/api.py#L402
 


I am currently working on adding a “More like this” feature. At first, I was 
thinking of automatically displaying similar query results when a query has 
been made. But due to some limitations of Sunburnt, this would mean making two 
different requests to Solr (one for getting the query results, and one for 
getting the results that are similar to the initially retrieved results). Would 
it be better to have a “More like this” button next to the query results, and 
if a user chooses to see similar results, then a new request would be made to 
Solr? I began to implement the ITemplateStreamFilter interface for adding a 
button on the search results page, but haven’t successfully finished yet. 

Also, you might have noticed I called the paginate(rows=20000) method on the 
query chain (in the bhsolr.solr_backend query() method). By default, Solr 
fetches only 10 documents when a query has been made. And there is no way for 
fetching all query results [1]. I was thinking of a reasonable solution, but I 
would like your opinion on this matter: 

So, a solution for this would be to specify a maximum number of results to be 
retrieved. And the number should be as close as possible to the number of 
documents stored in the index. So I could keep track of how many documents were 
added to the index, and update the ‘rows’ parameter every time the number 
changes. 
Another solution would be to make multiple smaller queries (with the ‘rows’ 
parameter set to the max number of results per page) until all results have 
been fetched.
How should I proceed in implementing this?

Thanks,
Antonia 
[1] https://issues.apache.org/jira/browse/SOLR-534

Reply via email to