Hi Sandeep, How many threads do you use to do the indexing? The benchmarks of Lucene are done on >20 threads IIRC.
-Rob On Tue, Feb 23, 2016 at 8:01 AM, sandeep das <[email protected]> wrote: > Hi, > > I've implemented a tool using lucene-5.2.0 to index my CSV files. The tool > is reading data from CSV files(residing on disk) and creating indexes on > local disk. It is able to process 3.5 MBps data. There are overall 46 > fields being added in one document. They are only of three data types 1. > Integer, 2. Long, 3. String. > All these fields are part of one CSV record and they are parsed using > custom CSV parser which is faster than any split method of string. > > I've configured the following parameters to create indexWriter > 1. setOpenMode(OpenMode.CREATE) > 2. setCommitOnClose(true) > 3. setRAMBufferSizeMB(512) // Tried 256, 312 as well but performance is > almost same. > > I've read over several blogs that lucene works way faster than these > figures. So, I thought there are some bottlenecks in my code and profiled > it using jvisualvm. The application is spending most of the time in > DefaultIndexChain.processField i.e. 53% of total time. > > > Following is the split of CPU usage in this application: > 1. reading data from disk is taking 5% of total duration > 2. adding document is taking 93% of total duration. > > - postUpdate -> 12.8% > - doAfterDocument -> 20.6% > - updateDocument -> 59.8% > - finishDocument -> 1.7% > - finishStoreFields -> 4.8% > - processFields -> 53.1% > > > I'm also attaching the screen shot of call graph generated by jvisualvm. > > I've taken care of following points: > 1. create only one instance of indexWriter > 2. create only one instance of document and reuse it through out the life > time of application > 3. There will be no update in the documents hence only addDocument is > invoked. > Note: After going through the code I found out that addDocument is > internally calling updateDocument only. Is there any way by which we can > avoid calling updateDocument and only use addDocument API? > 4. Using setValue APIs to set the pre created fields and reusing these > fields to create indexes. > > Any tip to improve the performance will be immensely appreciated. > > Regards, > Sandeep > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] >
