Hi,

Do you require the entire object to be loaded into memory in order to 
compare it with another object? Do these objects have IDs and could be 
accessed by IDs quickly after sorting? If so, you could derive a 
lightweight proxy only containing few attributes of such object and work 
with those, reducing the amount of heap needed. After the lightweights are 
sorted, you would know the order number of each one, and in turn, its 
parent.

If you can't extract a lightweight attribute subset, perhaps you can come 
up with some sort of universal object score for each object and work with 
that?

m.


On Friday, 9 November 2018 15:08:23 UTC, Shevek wrote:
>
> Hi, 
>
> I'm trying to sort/merge a very large number of objects in Java, and 
> failing more spectacularly than normal. The way I'm doing it is this: 
>
> * Read a bunch of objects into an array. 
> * Sort the array, then merge neighbouring objects as appropriate. 
> * Re-fill the array, re-sort, re-merge until compaction is "not very 
> successful". 
> * Dump the array to file, repeat for next array. 
> * Then stream all files through a final merge/combine phase. 
>
> This is failing largely because I have no idea how large to make the 
> array. Estimating the ongoing size using something like JAMM is too 
> slow, and my hand-rolled memory estimator is too unreliable. 
>
> The thing that seems to be working best is messing around with the array 
> size in order to keep some concept of runtime.maxMemory() - 
> runtime.totalMemory() + runtime.freeMemory() within a useful bound. 
>
> But there must be a better solution. I can't quite think a way around 
> this with SoftReference because I need to dump the data to disk when the 
> reference gets broken, and defeating me right now. 
>
> Other alternatives would include keeping all my in-memory data 
> structures in serialized form, and paying the ser/deser cost to compare, 
> but that's expensive - my main overhead right now is gc. Serialization 
> is protobuf, although that's changeable, since it's annoying the hell 
> out of me (please don't say thrift - but protobuf appears to have no way 
> to read from a stream into a reusable object - it has to allocate the 
> world every single time). 
>
> Issues: 
> * This routine is not the sole tenant of the JVM. Other things use RAM. 
> * This has to be deployed and work on systems whose memory config is 
> unknown to me. 
>
> Can anybody please give me pointers? 
>
> S. 
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to