On 7/11/06, BlueJay <[EMAIL PROTECTED]> wrote:
> David Balmain wrote:
>
> David
>
> Thanks for your continued help and assistance.
>
> I don't have code at this stage because I started writing it one way and
> realised that the way I was writing it through counts in Ruby would not
> work because of pagination.
>
> A little more background is in order. The user will be presented with a
> pull down menu with 5 selections in a main category. Doing 6 queries
> (one main query) and 5 count queries in this instance is not a problem.
> The problem arises when they select one of these categories.
>
> They will then be presented with up to 5 other category structures. One
> would be new or old, another would be type (up to 5 nodes), another
> would be, for example, book type (such as fiction, no fiction,
> authbiography) etc. (up to 20 categories), another could have up to 40
> categories. The user is free to select any of these category nodes
> because they may be interested in old books and fiction. I will
> therefore have to populate all of the nodes with the number of documents
> in each node. This could leave me with spawing 60 odd queries to count
> the number of documents in each node. Subsequent selections of nodes
> would refine the result set down further.
>
> What I really would like to do is 2 or 3 queries. One which does the
> normal search over the document set (collection) and the second to
> populate each node in the classification structure with the number of
> documents that match each node.
>
> It is pretty easy in 2 queries to tell if there are any documents in
> each node but doing a count over all the nodes is more tricky. I was
> originally going to have another table which had a row for each node
> with the name of the node (and structure) in one field and the
> document_id's in another field. For example, [Fishing, "doc1 doc2 doc3
> doc4"], [Fishing/Fiction, "doc2, doc3"], [Fishing/Non Fiction, "doc 1]
> etc. I would then get a result set that provided all the categories that
> had hits against a given query. However, it does not provide the number
> of documents against each node. So I could not populate the pull down
> categories with Fishing (2), Fiction (1), Non Fiction (1) etc.
>
> Therefore, what I really need is a function that will return the number
> of documents in each node of a given classification structure. An
> addition to the Num_Docs capability already available perhaps.
>
> I could easily produce a results set that would be like this....
>
> Fishing doc1
> Fishing doc2
> Fishing/Fiction doc3
> Fishing/Fiction doc1
> Fishing/Non Fiction doc4
> etc...
>
> Num_Docs would provide 5 in this instance but what I really want is:
> Fishing 2
> Fishing/Fiction 2
> Fishing/Non Fiction 1
> etc...
>
> All that, and done in 1 or 2 queries over and above the original
> search.... Simple eh!
>
> I hope that I have not confused you to much, but this is something that
> I desperately need or my project is kaput!
>
> I found this:
> http://www.mail-archive.com/[email protected]/msg00343.html and
>
> http://www.ruby-forum.com/topic/56232#40931
>
> Do you think that this is the way to go?
I think I finally understand what you want now and I do think this is
the way to go. What you will need to do is build BitVectors for each
of your categories and sub-categories using the examples in those
those threads. Or you could just use a QueryFilter.
filter = QueryFilter.new(PrefixQuery.new(:category, "fishing")))
fishing_bits = filter.bits(index_reader)
filter = QueryFilter.new(PrefixQuery.new(:category, "fishing/fiction")))
fishing_fiction_bits = filter.bits(index_reader)
filter = QueryFilter.new(PrefixQuery.new(:category, "fishing/nonfiction")))
fishing_nonfiction_bits = filter.bits(index_reader)
This assumes that everything in fishing/fiction is also in fishing/.
In your example, it doesn't seem to be the case, so you should use a
TermQuery instead of a PrefixQuery.
Now you just need to run your search the same way. Something like this;
query = query_parser.parse(query_str)
query_bits = QueryFilter.new(query).bits(index_reader)
And now you can get your counts like this;
fishing_count = (fishing_bits & query_bits).count
fishing_fiction_count = (fishing_fiction_bits & query_bits).count
fishing_nofiction_count = (fishing_nonfiction_bits & query_bits).count
Sadly, this code only works in theory since I haven't release the code
that &s bit vectors yet and I used the new style PrefixQuery
declarations so they won't work either. But if this solution seems
like it will work for you and you can wait a week, you'll be set.
Cheers,
Dave
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk