Also giving the background information
I have uploaded 25000 folders each with 15 documents (3,75,000
documents)
in a MS-SQL Server 2005. After that we added 2.5 MB pdf document it took
around 8 seconds.
We profiled the process and noticed that major time was spent on text
extraction in PDFBOX. Also the http thread waited till the extraction thread
completion.
Thanks
Ajai
Ajai wrote:
>
> We are using 1.5
>
> Thanks
> Ajai
>
> Marcel Reutegger wrote:
>>
>> that looks OK to me. what version of jackrabbit are you using?
>>
>> regards
>> marcel
>>
>> On Wed, Aug 5, 2009 at 12:18, Ajai<[email protected]> wrote:
>>>
>>> Also attaching the configuration as a text file
>>> http://www.nabble.com/file/p24824270/config.txt config.txt
>>>
>>>
>>>
>>> Ajai wrote:
>>>>
>>>> Thanks marcel for the response.
>>>> Please find below the configuration:
>>>>
>>>> <SearchIndex
>>>> class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> </SearchIndex>
>>>>
>>>> Kindly let us know your thoughts
>>>>
>>>> Thanks,
>>>> Ajai G
>>>>
>>>>
>>>>
>>>> Marcel Reutegger wrote:
>>>>>
>>>>> can you please send the configuration again in plain text. the
>>>>> configuration didn't make it through.
>>>>>
>>>>> but in any case, you can set the parameter extractorPoolSize to the
>>>>> number of background threads that you want to give the text extraction
>>>>> process. see also: http://wiki.apache.org/jackrabbit/Search
>>>>>
>>>>> regards
>>>>> marcel
>>>>>
>>>>> On Wed, Aug 5, 2009 at 11:22, Ajai<[email protected]> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Whenever we add a document to the repository, the indexing and
>>>>>> extraction
>>>>>> seems to happen in the same thread. Due to this, the addition takes
>>>>>> around 8
>>>>>> secs for a 2.5 MB document.
>>>>>>
>>>>>> We would like to make this extraction and indexing to be done on a
>>>>>> background thread.
>>>>>>
>>>>>> I have the following configuration for searchIndex in the
>>>>>> repository.xml
>>>>>>
>>>>>> <SearchIndex
>>>>>>
>>>>>> class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> </SearchIndex>
>>>>>>
>>>>>> Please let us know if any configuraion changes needs to be made.
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> Ajai G
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://www.nabble.com/How-to-do-Indexing-and-Extraction-in-Background-threads-tp24823548p24823548.html
>>>>>> Sent from the Jackrabbit - Dev mailing list archive at Nabble.com.
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/How-to-do-Indexing-and-Extraction-in-Background-threads-tp24823548p24824270.html
>>> Sent from the Jackrabbit - Dev mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>
>
--
View this message in context:
http://www.nabble.com/How-to-do-Indexing-and-Extraction-in-Background-threads-tp24823548p24826389.html
Sent from the Jackrabbit - Dev mailing list archive at Nabble.com.