Hello!
I am currently choosing technology for web crawler and search engine that will index between 1 and 10 million of documents (with storing documents). For some parts of the project I'll most likely choose existing software, for some I'll have to right new code, but at the end it should be pure java solution.
I am considering Lucene as solutions for text indexing and searching and I have few questions about Lucene for which I was not able to find answers in FAQs, Articles etc.
Is Lucene suitable for ~10 million documents?
Is it possible to have boosts factor per document ? The thing is that I need to have something like sort order of documents in relevance, but relevance cannot been calculated only from that document, because there are some external factors as well (e.g. Google PageRank algorithm). I think that I can calculate all this factors in one factor that can been stored in index, but can I use it to boost relevance of some documents ?
I guess it is possible, but would it require for some parts of Lucene to be rewritten to enable this ? Or should I just fetch documents from Lucene and then sort them outside?
Best Regards, Sekula
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
