Thanks for the sharing and suggestion. Yes Chris, the index is to be partitioned by date time, and old index will not be access so frequent.
I also did consider indexing in parallel to different index as well Erick. But I can only put all index in ONE machine and there is only ONE machine to process the job (both searching and indexing). I haven’t try out the addIndexes to combine indexes but I did tried out MultiSearcher for 3 millions of documents in 3 separate indexes and it does not have much different compare to a search in 1 index. Could you mind to share your experience in addIndexes, how is the performance for that ? In my situation, the indexes may be use for searching at the same time. Another worry for indexing is optimization process, I have tried it out and it take quite some time to optimize my index, e.g indexing for 5000 is about 10seconds (MergeFactor = 1000, and recreate IndexWriter every 5000 documents), but to optimize, it will take around ONE minute. So do you guys have any suggestion on optimization process as well ? when should I run the optimization process ? and is it a lot of different with searching in a optimized index compare to a un-optimize ? ------ eChuang, Chew -----Original Message----- From: Chris Lu [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 19, 2007 2:19 AM To: java-user@lucene.apache.org Subject: Re: FW: Lucene indexing vs RDBMS insertion. Definitely very aggressive. Currently my experience is that, together with database access, DBSight can do 3 million in 2 hours, with Pentium D 3.4Hz. Seems you definitely need some good hardware, and a fast hard drive for this. I feel the hard drive is actually the bottleneck for large indexes. Partitioning your data and pruning the old data should be also considered. -- Chris Lu ------------------------- Instant Scalable Full-Text Search On Any Database/Application site: http://www.dbsight.net demo: http://search.dbsight.com Lucene Database Search in 3 minutes: http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_m inutes On 6/18/07, Chew Yee Chuang <[EMAIL PROTECTED]> wrote: > Thanks for your suggestion Erick. I'm planning to test the indexing soon. > For your information, currently the system is inserting into RDBMS which is > around 1000 records per seconds. Thus, if lucene in place, I would expect it > will index that much of documents per seconds as well (Our target is 3.6 > millions of document to be indexed in 1 hour). Beside of that, I'm planning > to queue the record so lucene will have enough time to index it. Anyway, > thanks for your suggestion and will come back to you once I tested the > solution. > > Thanks, > eChuang, Chew > > -----Original Message----- > From: Erick Erickson [mailto:[EMAIL PROTECTED] > Sent: Friday, June 15, 2007 11:11 PM > To: java-user@lucene.apache.org > Subject: Re: FW: Lucene indexing vs RDBMS insertion. > > From my perspective, this is an irrelevant question. The real question > is "is Lucene indexing fast enough for my application?". Which nobody > can answer for you, you have to experiment. > > If you're building an index that's only updated every 6 months, > Lucene is certainly "fast enough". If you're recreating the > index every 6 seconds, it's a different question. > > So, I recommend that you create a test application that does > nothing except read your source, do whatever parsing you > need to do and does NOT index it at all. Record the time it > takes. > > Then try the same thing WITH indexing and record the difference. > > Then, to get a sense of the dimension of the problem, try > substituting inserting into the RDBMS instead of the Lucene > index. > > Once you have numbers, you can make better decisions > And people can give you better advice, especially if you > include more detail of your design. > > Best > Erick > > On 6/15/07, Chew Yee Chuang <[EMAIL PROTECTED]> wrote: > > > > Hi, I'm a new user to Lucene, and heard that it is a powerful tool for > > full > > text search and I'm planning to use it in my project for data storage > > purpose. Before the implementation, I could like to know whether there is > > performance issue on Lucene indexing process. I have no doubt on the > > retrieving and searching feature in Lucene but the indexing process. I > > have > > tested my current system to insert 1000 records in RDBMS storage it took > > about 1 seconds. Thus, if I change my solution to Lucene, can Lucene > > indexing process perform faster than RDBMS ? I have go through some of the > > article talking about the "MergeFactor" and "MaxMergeDocs" parameter for > > fine tune the indexing process, but no comparison between Lucene indexing > > process and RDBMS insertion. Thus, hope someone who have experience in > > Lucene can provide this information or some article that discuss between > > Lucene and RDBMS. > > > > > > > > I really appreciate any help in this. Thanks > > > > > > No virus found in this outgoing message. > > Checked by AVG Free Edition. > > Version: 7.5.472 / Virus Database: 269.8.16/849 - Release Date: 6/14/2007 > > 12:44 PM > > > > > > No virus found in this incoming message. > Checked by AVG Free Edition. > Version: 7.5.472 / Virus Database: 269.9.0/852 - Release Date: 6/17/2007 > 8:23 AM > > > No virus found in this outgoing message. > Checked by AVG Free Edition. > Version: 7.5.472 / Virus Database: 269.9.0/852 - Release Date: 6/17/2007 > 8:23 AM > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.5.472 / Virus Database: 269.9.0/852 - Release Date: 6/17/2007 8:23 AM No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.5.472 / Virus Database: 269.9.0/853 - Release Date: 6/18/2007 3:02 PM --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]