Hi,
I'm trying to sort/merge a very large number of objects in Java, and
failing more spectacularly than normal. The way I'm doing it is this:
* Read a bunch of objects into an array.
* Sort the array, then merge neighbouring objects as appropriate.
* Re-fill the array, re-sort, re-merge until compaction is "not very
successful".
* Dump the array to file, repeat for next array.
* Then stream all files through a final merge/combine phase.
This is failing largely because I have no idea how large to make the
array. Estimating the ongoing size using something like JAMM is too
slow, and my hand-rolled memory estimator is too unreliable.
The thing that seems to be working best is messing around with the array
size in order to keep some concept of runtime.maxMemory() -
runtime.totalMemory() + runtime.freeMemory() within a useful bound.
But there must be a better solution. I can't quite think a way around
this with SoftReference because I need to dump the data to disk when the
reference gets broken, and defeating me right now.
Other alternatives would include keeping all my in-memory data
structures in serialized form, and paying the ser/deser cost to compare,
but that's expensive - my main overhead right now is gc. Serialization
is protobuf, although that's changeable, since it's annoying the hell
out of me (please don't say thrift - but protobuf appears to have no way
to read from a stream into a reusable object - it has to allocate the
world every single time).
Issues:
* This routine is not the sole tenant of the JVM. Other things use RAM.
* This has to be deployed and work on systems whose memory config is
unknown to me.
Can anybody please give me pointers?
S.
--
You received this message because you are subscribed to the Google Groups
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.