Re: How to do Indexing and Extraction in Background threads

Ajai Wed, 05 Aug 2009 05:53:49 -0700

Also giving the background information

        I have uploaded 25000 folders each with 15 documents (3,75,000 
documents)
in a MS-SQL Server 2005. After that we added 2.5 MB pdf document it took
around 8 seconds.


        We profiled the process and noticed that major time was spent on text
extraction in PDFBOX. Also the http thread waited till the extraction thread
completion.
      
Thanks
Ajai


Ajai wrote:
> 
> We are using 1.5
> 
> Thanks
> Ajai
> 
> Marcel Reutegger wrote:
>> 
>> that looks OK to me. what version of jackrabbit are you using?
>> 
>> regards
>>  marcel
>> 
>> On Wed, Aug 5, 2009 at 12:18, Ajai<[email protected]> wrote:
>>>
>>> Also attaching the configuration as a text file
>>> http://www.nabble.com/file/p24824270/config.txt config.txt
>>>
>>>
>>>
>>> Ajai wrote:
>>>>
>>>> Thanks marcel for the response.
>>>> Please find below the configuration:
>>>>
>>>> <SearchIndex
>>>> class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>   </SearchIndex>
>>>>
>>>> Kindly let us know your thoughts
>>>>
>>>> Thanks,
>>>> Ajai G
>>>>
>>>>
>>>>
>>>> Marcel Reutegger wrote:
>>>>>
>>>>> can you please send the configuration again in plain text. the
>>>>> configuration didn't make it through.
>>>>>
>>>>> but in any case, you can set the parameter extractorPoolSize to the
>>>>> number of background threads that you want to give the text extraction
>>>>> process. see also: http://wiki.apache.org/jackrabbit/Search
>>>>>
>>>>> regards
>>>>>  marcel
>>>>>
>>>>> On Wed, Aug 5, 2009 at 11:22, Ajai<[email protected]> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Whenever we add a document to the repository, the indexing and
>>>>>> extraction
>>>>>> seems to happen in the same thread. Due to this, the addition takes
>>>>>> around 8
>>>>>> secs for a 2.5 MB document.
>>>>>>
>>>>>> We would like to make this extraction and indexing to be done on a
>>>>>> background thread.
>>>>>>
>>>>>> I have the following configuration for searchIndex in the
>>>>>> repository.xml
>>>>>>
>>>>>> <SearchIndex
>>>>>>
>>>>>>  class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>                </SearchIndex>
>>>>>>
>>>>>> Please let us know if any configuraion changes needs to be made.
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> Ajai G
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://www.nabble.com/How-to-do-Indexing-and-Extraction-in-Background-threads-tp24823548p24823548.html
>>>>>> Sent from the Jackrabbit - Dev mailing list archive at Nabble.com.
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/How-to-do-Indexing-and-Extraction-in-Background-threads-tp24823548p24824270.html
>>> Sent from the Jackrabbit - Dev mailing list archive at Nabble.com.
>>>
>>>
>> 
>> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/How-to-do-Indexing-and-Extraction-in-Background-threads-tp24823548p24826389.html
Sent from the Jackrabbit - Dev mailing list archive at Nabble.com.

Re: How to do Indexing and Extraction in Background threads

Reply via email to