Hi Pablo,

> It seems that the GC needs more information from the Data Store to do
> its job. One approach may be to have "transient" and "persisted" binary
> content so that the GC doesn't delete "transient" binary content.

Yes, I agree, that would be the best solution. Or: never delete data
where a reference is in memory (no matter if it's transient).

What about storing all DataRecord objects in a WeakHashMap. Then there
are two solutions:

Plan A) the garbage collection just checks the hash map and doesn't
delete those.
Plan B) there is a background daemon thread that updates the modified
date from time to time.

Plan A sounds simpler, Plan B would solve some distributed GC problems.

> The GC can use observation to be notified
> of property creation, as currently does to detect when nodes are moved
> during GC scan. We can use a mark file for the binary content state in
> the FileDataStore implementation and an additional column in the binary
> content table for the DatabaseDataStore.

I think that would work as well, but the observation solution would be
more complex in my view.

> What is the overhead of a PROPERTY_ADDED + PROPERTY_CHANGED listener in
> Jackrabbit ?

I don't know. If it must observe all properties all the time, probably
it should be avoided. If there is a way to only observe binary
properties, or only while the GC is running, then it should be OK.

Thomas

Reply via email to