You could always use an array of byte[]. Each sub-array will be allocated on its own - making the contiguous need much smaller.

With proper coding the offset calculation is a simple shift - so the performance should be negligible given the other code.

On Feb 19, 2008, at 1:48 PM, Paul Elschot wrote:


Allocating large blocks while also allocating more smaller
blocks is a known problem for memory allocators, so adding a
pool with preallocated blocks sounds like a good idea.

With 14 million of 64 million bits set, there may not be much
room to decrease the memory needed. When the set bits
are random, I'd expect it to be practically impossible to compress
to less than 55%. When there are long ranges of set bits,
things get different, and interval coding can help a lot.

Btw. there is some room in SortedVIntList to add interval
coding. Normally the VInt value 0 cannot occur in the current
version, and this could be used as a prefix to encode a run of
set bits.

Regards,
Paul Elschot


Op Tuesday 19 February 2008 12:58:34 schreef eks dev:
hi Mark,

just out of curiosity, do you know the distribution of set bits  in
these terms you have tried to cache? maybe this simple tip could
help.
If you are lucky like we were, such terms typically used for filters
are good candidates to be used to sort your index before indexing
(once in a while) and then with some sort of IntervalDocIdSet you can
reduce memory requirements dramatically.



----- Original Message ----
From: markharw00d <[EMAIL PROTECTED]>
To: java-dev@lucene.apache.org
Sent: Tuesday, 19 February, 2008 9:20:02 AM
Subject: Re: Out of memory - CachingWrappperFilter and multiple
threads

I now think the main issue here is that a busy JVM gets into trouble
trying to find large free blocks of memory for large bitsets.
In my index of 64 million documents, ~8meg of contiguous free memory
must be found for each bitset allocated. The terms I was trying to
cache had 14 million entries so the new DocIdSet alternatives for
bitsets probably fare no better.

The JVM (Sun 1..5) doesn't seem to deal with these allocations well.
Perhaps there's an obscure JVM option I can set to reserve a section
of RAM for large allocations.
However, I wonder if we should help the JVM out a little here by
having pre-allocated pools of BitsSets/OpenBitSets that can be
reserved and reused by the application. This would imply a change to
filter classes so instead of constructing BitSets/OpenBitsets
directly they get them from a pool instead.

Thoughts?


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






      __________________________________________________________
Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to