Hey, I'm new to Lucene... I was wondering if we can use Lucene/Solr for word frequency counting (eg, in a subset of full text papers).
Thanks for any info you may provide. Shuai On Aug 11, 2010, at 10:16 AM, Julien Nioche wrote: > BTW I don't remember anyone on the Nutch list suggesting you to use Carrot > for this (see : http://search-lucene.com/?q=luan+carrot) or classifying at > querying time > > What I suggested in http://search-lucene.com/m/JWZTj1q4lB92 was about > classifying during the parsing or indexing and generating a field for Lucene > or SOLR. As Otis pointed out you can of course use SOLR for faceting. Since > you will be using Nutch anyway, you might as well avoid an external DB just > for storing the results of the classification and just keep the labels e.g. > in the parse metadata > > Julien > -- > DigitalPebble Ltd > > Open Source Solutions for Text Engineering > http://www.digitalpebble.com > > On 9 August 2010 00:16, Luan Cestari <luan.cest...@gmail.com> wrote: > >> >> Lucene developers, >> >> We’ve been working on a undergraduate project to the college about changing >> Apache Nutch (that uses Lucene do index it’s web pages) to include a >> category filter, and we are having problems about the query part. We want >> to >> develop an application with a good performance, so we thought that here >> would be the best place to ask this kind of question. The idea is that the >> user can search pages stored for only a category. So the number of results >> found should display the number of pages that actually is classified in >> that >> category. >> >> The problem is about how to add to the Lucene indexes the category >> information, and how filter the search on that. We tried to look on the >> Nutch mailing-list (Nabble) about that and asked some help, but people from >> there think that we should use some plug-in like Carrot, that get like 100 >> of pages and classify it in the query time. We are not very confident that >> it’s the best solution. We thought in other two different ideas: #1 To >> classify those pages and store that information on a DB and in the query >> time filter the result that DB to filter the result. #2 Use different index >> servers, one for each category and one to search without filtering by >> category. >> >> We have seen on this project http://search-lucene.com/ that there are >> pre-defined categories. We think that this should be classified at indexing >> time, as we wanted. >> >> Do you have any other idea about how to do that? >> >> Sincerely, >> >> Daniel Costa Gimenes & Luan Cestari >> Undergraduate students of University Center of FEI >> Brazil >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Using-categories-with-Lucene-tp1049232p1049232.html >> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org