I guess I never saw this request. Here is my answer.

Carrot would give me things like this 
Genre - Horror
   Scary Movie - (40)
      - Luke Perry (10)

Which is not what I am going for.

Basically, think someone clicks on the "Horror" tab and they see all of the
articles for every movie/actor within that genre. I page the results of
course, and just use a hitcollector to combine all hits.  My total number of
articles being returned for a genre is around 3000-4000 out of a maximum of
about 42K worth of articles right now.

The problem, is that I have specific queries for getting the news for an
actor, and a specific query for getting results for a movie and the genre
level needs to aggregate each of those queries.  So, an actor may have a
maximum of 20 boolean queries, a movie may have 5 boolean queries + (40
actors * 20 boolean queries) so a genre would have something like ((50* (5 *
20)) boolean queries for a genre
Somewhere along the lines though that is throwing a out of memory error. I
am using the queryparser, but the queries themselves are fairly simple.
An example for actor would be:
(+title:luke +title:perry) (+content:"Luke Perry")
Some other items in there to make sure the article is about luke perry (just
using him as an example)

So, anyone else have any ideas why I may be running out of memory on these?

Or, any other ideas on how to combine everything together quickly, besides
doing it offline first and then using online.  My only issue with this one
is that, as soon as I change the query logic, the process to build the index
must change, and then the website has to change sometime after that. It
could be days to see it in production, versus hours if I have to do is
change the query code in the web application.


Scott






Find Me wrote:
> 
> On 1/2/07, sdeck <[EMAIL PROTECTED]> wrote:
>>
>>
>> Thanks for advanced on any insight on this one.
>>
>> I have a fairly large query to run, and it takes roughly 20-40 seconds to
>> complete the way that i have it.
>> here is the best example I can give.
>>
>> I have a set of roughly 25K documents indexed
>>
>> I have queries that get documents matching a particular actor.
>>
>> Then, I have a movie query that takes all of the documents found for each
>> actor query and combines them all together to say, here are all documents
>> that are relevant for this movie.
>>
>> Then, and here is the time hog, I have a genre query that says, take all
>> movies and get their results and combine them together into this genre
>> result set.
> 
> 
> Is there any possibility to use Carrot clustering for genre? Could you
> please give examples for the final complex query as well as individual
> simple queries?  You can also state the aim of the query. Are you trying
> to
> get clustered list of movies (based on genre) for a particular actor?
> 
> --Rajesh Munavalli
> 
> The problem is, at indexing time, I do not have a way to say if a document
>> is a particular genre, or a particular actor, or movie etc.  If I try and
>> say for the genre query, get all documents and then filter for the
>> queries
>> for movies and actors, I get heap space memory issues.
>>
>> The query for collecting a specific actor is around 200-300 milliseconds,
>> and the movie one, that actually queries each actor, takes roughly
>> 500-700
>> milliseconds. Yet, for a genre, where you may have 50-100 movies, it
>> takes
>> 500 milliseconds*# of movies
>>
>> Any ideas on how I could run these queries differently? For a given actor
>> query, there is about 5-7 boolean query clauses. Just to give some
>> insight.
>>
>> I currently just create 1 HitSetCollector (I rolled my own
>> bitsetcollector)
>> and just run searches with it.  I just get crapped on when it does that
>> genre search. I wish there was an easier way to aggregate all of those
>> documents together from all of those searches.  After it is done, I cache
>> the results, but the initial hit is bad.
>>
>> Any help would be much appreciated.
>> Sdeck
>>
>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Speed-of-grouped-queries-tf2910499.html#a8132099
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Speed-of-grouped-queries-tf2910499.html#a8267691
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to