[django-haystack] model inheritance problem (haystack + whoosh)

Sebastian Quiles Wed, 22 Mar 2017 12:17:08 -0700

Hi,

    first of all I want to congratulate for the excelent job done in this 
project. A couple of months ago we needed to prototipe very quickly a 
document repository with search capabilities and we discovered Whoosh and 
Haystack as a very simple way to make it work. Haystack was a good way to 
be able to switch to any other engine in a easy way.


    When we were developing we could use haystack for the indexing process, 
but when we tried to use it for the search mechanism we faced some problems 
and decided to search directly with wohoos, skipping the haystack layer. 
Now that we have some more time to spend, we are trying to search with 
haystack but still having problems.


Our django model has some inheritance:

 
class Indexable(PolymorphicModel, TimeStampedModel):
    name = models.CharField(max_length=500, null=True) 
    text = models.TextField(null=True, blank=True, default="") 
    library = models.ForeignKey(Library, null=True) [...] 

class File(Indexable, unicode_name):
    FILE_TYPE = Choices('Image', 'WordDocument', 'PlainText', 'DataSet')

class Row(Indexable, unicode_name):
    def __str__(self):


we have several indexes (libraries) and we are storing Files or Rows in 
them.

we have only one "haystack index class"

class FileIndex(indexes.SearchIndex, indexes.Indexable): 
    text = FullLanguageCharField(document=True, model_attr='text') 
    [...] 
    def get_model(self): 
        return Indexable
    [...] 

The search is performed with this code in Whoosh:

        ix = open_dir(settings.HAYSTACK_CONNECTIONS[library]['PATH'], 
readonly=True)
        searcher = ix.searcher()
        pagenum = int(request.query_params.get('pagenum', 1))
        pagelen = 10
        parser = MultifieldParser([text], ix.schema).parse('hello')

        results = searcher.search_page(parser, pagenum, pagelen=pagelen)



the same query performed in haystack

c = (SearchQuerySet().using(library).filter(text = 'hello').count())


I've also added the following line in settings.py

HAYSTACK_LIMIT_TO_REGISTERED_MODELS = False


the first thing I ve tried was to count and see if I have the same result 
size, but I've got one less (and it seems to be allways one) in the 
Haystack method.

so if I debug i fall to the following line in whoosh_backend:

            try:
                raw_page = searcher.search_page(
                    parsed_query,
                    page_num,
                    **search_kwargs
                )
            except ValueError:


and after executing it raw_page has the following values
offset    int: 0    
pagecount    int: 14    <<< correct size
pagelen    int: 1    
pagenum    int: 1    

but in the whoosh_backend the following line is executed

            results = self._process_results(raw_page, highlight=highlight, 
query_string=query_string, spelling_query=spelling_query, result_class=
result_class)


and then in the _process_results method:


    def _process_results(self, raw_page, highlight=False, query_string='', 
spelling_query=None, result_class=None):
        from haystack import connections
        results = []
        hits = len(raw_page)
        if result_class is None:
            result_class = SearchResult
        facets = {}
        spelling_suggestion = None
        unified_index = connections[self.connection_alias].get_unified_index
()
        indexed_models = unified_index.get_indexed_models()  
#<<<********************** 
Here indexed_models is instanced with [Indexable]
        for doc_offset, raw_result in enumerate(raw_page):   
            score = raw_page.score(doc_offset) or 0
            app_label, model_name = raw_result[DJANGO_CT].split('.')
            additional_fields = {}
            model = haystack_get_model(app_label, model_name)  
#<<<<******************* 
here model is instanced with File class
            if model and model in indexed_models:           
#<<<*********************** 
this IF is not executed because model is File but is not included in 
[Indexable]
               [...]
            else:
                hits -= 1                                  



when the for is executed, the "if model in indexed_models" fails and hits 
is by decreased 1 but the for cycle finishes (it is only executed once, 
maybe because pagelen = 1) and the final result of the function is:

results = []
hits = 13
facets = []
spelling_suggestion = []   (the whole process is very fast BUT the line 
where spelling_suggestion is calculated take several seconds)


I've tried to chenge this line:

            if model and model in indexed_models:


for this one:
            if model and issubclass(model, tuple(indexed_models)):


but it then fails in the following line:
                    index = unified_index.get_index(model)


saying there is no =registered index for File.


Any one can help... thanks!






--
Sebastian Quiles
ARGENTINA

-- 
You received this message because you are subscribed to the Google Groups 
"django-haystack" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-haystack+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[django-haystack] model inheritance problem (haystack + whoosh)

Reply via email to