Hi, I am currently analyzing, if Zend_Search_Lucene is the right choice for me to implement the following:
I have about 20.000 data sets with about 50 different fields. 20 of these fields are used for sorting and selecting the data sets. These fields are not fixed and can be changed at any time by the admins. But basically this will probably only happen once or twice a month, but it will happen. Another problem is that not all of these 50 fields are used for all data sets. As a matter of fact none of these 20.000 data sets will ever need all these 50 fields. Currently, I was thinking of implementing this with a couple of MySQL tables which hold all these fields. So sorting and selecting shouldn't be such a problem. But whenever new fields are added then the tables need to be amended. As well most rows will have more empty columns than filled columns due to the fact that none of the datasets use all 50 fields. Now I am thinking about implementing the selecting and sorting with Zend_Search_Lucene. In my MySQL table I will only have a small number of columns (primary key, some foreign keys, date columns and a text column which hold the serialized data of all other fields). This data should be indexed by Zend_Search_Lucene and the index should be used for searching and sorting the data. Has anybody build an index with such an amount of fields (20, not 50 because only 20 are used for searching and sorting), which can change from time to time and maybe grow up to 40 or even 50 fields by the time? Did you run into any problems? My queries could be quite complex with using up to 10 fields combined by AND to get the results. Does this work without any problems? Or should I limit the number of fields that can be choosed for a single query? I am worried a lot about the performance. I understand that Zend_Search_Lucene has no real support for paginating. So when I only want to get the first 20 results for a query with 5 fields then the whole index will be searched and all results will be returned. Will performance be really an issue? What do you think in general about my ideas? Thanks for your comments. Best regards, Ralf
