Hi Brian,

Brian Moseley wrote:
even more astonishing, 1051 of those open fds are index files:

java 12405 root 40r REG 9,1 22608 22875403 /home/cosmo-demo-roots/prod7/data/repository/workspaces/homedir/index/_0/_2y.cfs java 12405 root 41r REG 9,1 2856 22875406 /home/cosmo-demo-roots/prod7/data/repository/workspaces/homedir/index/_1/_8.cfs java 12405 root 42r REG 9,1 2291 22875409 /home/cosmo-demo-roots/prod7/data/repository/workspaces/homedir/index/_2/_8.cfs java 12405 root 43r REG 9,1 888 22940607 /home/cosmo-demo-roots/prod7/data/repository/workspaces/homedir/index/_3/_1.cfs

how many folders do you see under homedir/index ?

with a well configured index the amount should rarely exceed 20 sub index directories.

i don't know anything about lucene, but after looking at MultiIndex, i wonder if i'm having an issue with the frequency that the volatile index is persisted and/or the the persistent indexes are merged. i'm using the default SearchIndex configuration, that is to say:

<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
            <param name="useCompoundFile" value="true"/>
            <param name="minMergeDocs" value="1000"/>
            <param name="volatileIdleTime" value="3"/>
            <param name="maxMergeDocs" value="1000"/>
            <param name="mergeFactor" value="10"/>
            <param name="bufferSize" value="10"/>
            <param name="path" value="${wsp.home}/index"/>
        </SearchIndex>

two parameters do not match the default value you find in src/config/repository.xml

- minMergeDocs (default 100)
- maxMergeDocs (default 100000)

The two parameters are relevant for the incremental merge behaviour of the index.

I suggest that you try the default values, the index will probably create less index files.

some more background info on the search index and its merge behaviour which affects open files:

an index consists of several sub indexes that combined in a multi index. there is always one sub index that is held in memory (volatile index) and a number of persistent indexes on disk. new persistent indexes are created when (1) the volatile index reaches a certain size, which is controlled by minMergeDocs or (2) the whole index had been idle for a certain time, configured by volatileIdleTime. Increasing the value for minMergeDocs will use more memory because more nodes are kept in the volatile index. But a higher value will also increase performance for bulk loads. The drawback is, that queries are a bit slower.

persistent indexes are merged by a background thread. this process is controlled by three parameters: minMergeDocs, maxMergeDocs and mergeFactor

as mentioned before merging is done incrementally. several smaller indexes are merged into a larger index. imagine the following boxes:

  -----    -----    -----
  | A |    | B |    | C |  ...
  -----    -----    -----

Box A contains sub indexes with size <= minMergeDoc^(1*mergeFactor)
Box B contains sub indexes with size <= minMergeDoc^(2*mergeFactor)
Box C contains sub indexes with size <= minMergeDoc^(3*mergeFactor)

and so on.

as soons as a box contains a number of sub indexes equal to mergeFactor they are merged and put into the next box. the sub indexes from the source box are then deleted. the upper limit is controlled by maxMergeDocs. the merging process will never merge more than maxMergeDocs. Thus the number of boxes is limited.

regards
 marcel

Reply via email to