Please someone help me on this, i need to resolve this urgently. Thanks Ajai G
Ajai wrote: > > Also giving the background information > > I have uploaded 25000 folders each with 15 documents (3,75,000 > documents) > in a MS-SQL Server 2005. After that we added 2.5 MB pdf document it took > around 8 seconds. > > We profiled the process and noticed that major time was spent on text > extraction in PDFBOX. Also the http thread waited till the extraction > thread completion. > > Thanks > Ajai > > > Ajai wrote: >> >> We are using 1.5 >> >> Thanks >> Ajai >> >> Marcel Reutegger wrote: >>> >>> that looks OK to me. what version of jackrabbit are you using? >>> >>> regards >>> marcel >>> >>> On Wed, Aug 5, 2009 at 12:18, Ajai<[email protected]> wrote: >>>> >>>> Also attaching the configuration as a text file >>>> http://www.nabble.com/file/p24824270/config.txt config.txt >>>> >>>> >>>> >>>> Ajai wrote: >>>>> >>>>> Thanks marcel for the response. >>>>> Please find below the configuration: >>>>> >>>>> <SearchIndex >>>>> class="org.apache.jackrabbit.core.query.lucene.SearchIndex"> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> </SearchIndex> >>>>> >>>>> Kindly let us know your thoughts >>>>> >>>>> Thanks, >>>>> Ajai G >>>>> >>>>> >>>>> >>>>> Marcel Reutegger wrote: >>>>>> >>>>>> can you please send the configuration again in plain text. the >>>>>> configuration didn't make it through. >>>>>> >>>>>> but in any case, you can set the parameter extractorPoolSize to the >>>>>> number of background threads that you want to give the text >>>>>> extraction >>>>>> process. see also: http://wiki.apache.org/jackrabbit/Search >>>>>> >>>>>> regards >>>>>> marcel >>>>>> >>>>>> On Wed, Aug 5, 2009 at 11:22, Ajai<[email protected]> wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Whenever we add a document to the repository, the indexing and >>>>>>> extraction >>>>>>> seems to happen in the same thread. Due to this, the addition takes >>>>>>> around 8 >>>>>>> secs for a 2.5 MB document. >>>>>>> >>>>>>> We would like to make this extraction and indexing to be done on a >>>>>>> background thread. >>>>>>> >>>>>>> I have the following configuration for searchIndex in the >>>>>>> repository.xml >>>>>>> >>>>>>> <SearchIndex >>>>>>> >>>>>>> class="org.apache.jackrabbit.core.query.lucene.SearchIndex"> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> </SearchIndex> >>>>>>> >>>>>>> Please let us know if any configuraion changes needs to be made. >>>>>>> >>>>>>> >>>>>>> Thanks >>>>>>> Ajai G >>>>>>> -- >>>>>>> View this message in context: >>>>>>> http://www.nabble.com/How-to-do-Indexing-and-Extraction-in-Background-threads-tp24823548p24823548.html >>>>>>> Sent from the Jackrabbit - Dev mailing list archive at Nabble.com. >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> -- >>>> View this message in context: >>>> http://www.nabble.com/How-to-do-Indexing-and-Extraction-in-Background-threads-tp24823548p24824270.html >>>> Sent from the Jackrabbit - Dev mailing list archive at Nabble.com. >>>> >>>> >>> >>> >> >> > > -- View this message in context: http://www.nabble.com/How-to-do-Indexing-and-Extraction-in-Background-threads-tp24823548p24840415.html Sent from the Jackrabbit - Dev mailing list archive at Nabble.com.
