Answers below. > We have a database where in one table it contains an id and a name. > (like 1,Human 2,Man 3,Dutch) > The second table contains an id, a relationId and a reference to the > previous table. > So if a relation with id 1 is human is a Man and is Dutch the table > will > contain > 1, 1, 1 > 2, 1, 2 > 3, 1, 3 > You probably get the picture. > > We want to be able to eg query all humans that are Dutch. (we don't > care > if it is a man or woman). Using sql can be quite horrific for these > kind > of queries. > > I thought that maybe Lucene could do the indexing for us instead of > the sql database. > > It is not uncommon that the table with names contain about 3 million > entries and the relationship table can be a multiple of 3 million. > > The question is : > > - Is Lucene capable of handling huge amounts of data ?
"Huge" is not very specific, and capacity also depends on hardware and several other factors. 3 million documents (and yours would be small), is easy for Lucene. > - The result always must be EXACT. So no fuzzy stuff. If it has the > keywords in the index, show it, else never show it (so a query for > Human, Man, Dutch, should not return any people from Belgium). (I > assume > this should be possible) Sounds like you just need to limit queries to Boolean operators: AND OR and NOT. You can do that by creating queries manually, or changing the QueryParser to allow only such queries. > - Besides the index information we want to be able to store some > extra > data (like a description), that can be used to create an object we > can > use in our system. No problem. See Lucene's Field class. > - Is Lucene the way to go for this use scenario ? I am not sure. RDMBS are the tool to use for queries that use Boolean logic. You still could use Lucene to index your data by converting each row in your database to a Lucene Document. Otis --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
