Hi Brian,
Brian Moseley wrote:
even more astonishing, 1051 of those open fds are index files:
java 12405 root 40r REG 9,1 22608 22875403
/home/cosmo-demo-roots/prod7/data/repository/workspaces/homedir/index/_0/_2y.cfs
java 12405 root 41r REG 9,1 2856 22875406
/home/cosmo-demo-roots/prod7/data/repository/workspaces/homedir/index/_1/_8.cfs
java 12405 root 42r REG 9,1 2291 22875409
/home/cosmo-demo-roots/prod7/data/repository/workspaces/homedir/index/_2/_8.cfs
java 12405 root 43r REG 9,1 888 22940607
/home/cosmo-demo-roots/prod7/data/repository/workspaces/homedir/index/_3/_1.cfs
how many folders do you see under homedir/index ?
with a well configured index the amount should rarely exceed 20 sub
index directories.
i don't know anything about lucene, but after looking at MultiIndex, i
wonder if i'm having an issue with the frequency that the volatile index
is persisted and/or the the persistent indexes are merged. i'm using the
default SearchIndex configuration, that is to say:
<SearchIndex
class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
<param name="useCompoundFile" value="true"/>
<param name="minMergeDocs" value="1000"/>
<param name="volatileIdleTime" value="3"/>
<param name="maxMergeDocs" value="1000"/>
<param name="mergeFactor" value="10"/>
<param name="bufferSize" value="10"/>
<param name="path" value="${wsp.home}/index"/>
</SearchIndex>
two parameters do not match the default value you find in
src/config/repository.xml
- minMergeDocs (default 100)
- maxMergeDocs (default 100000)
The two parameters are relevant for the incremental merge behaviour of
the index.
I suggest that you try the default values, the index will probably
create less index files.
some more background info on the search index and its merge behaviour
which affects open files:
an index consists of several sub indexes that combined in a multi index.
there is always one sub index that is held in memory (volatile index)
and a number of persistent indexes on disk. new persistent indexes are
created when (1) the volatile index reaches a certain size, which is
controlled by minMergeDocs or (2) the whole index had been idle for a
certain time, configured by volatileIdleTime.
Increasing the value for minMergeDocs will use more memory because more
nodes are kept in the volatile index. But a higher value will also
increase performance for bulk loads. The drawback is, that queries are a
bit slower.
persistent indexes are merged by a background thread. this process is
controlled by three parameters: minMergeDocs, maxMergeDocs and mergeFactor
as mentioned before merging is done incrementally. several smaller
indexes are merged into a larger index. imagine the following boxes:
----- ----- -----
| A | | B | | C | ...
----- ----- -----
Box A contains sub indexes with size <= minMergeDoc^(1*mergeFactor)
Box B contains sub indexes with size <= minMergeDoc^(2*mergeFactor)
Box C contains sub indexes with size <= minMergeDoc^(3*mergeFactor)
and so on.
as soons as a box contains a number of sub indexes equal to mergeFactor
they are merged and put into the next box. the sub indexes from the
source box are then deleted. the upper limit is controlled by
maxMergeDocs. the merging process will never merge more than
maxMergeDocs. Thus the number of boxes is limited.
regards
marcel