Hi, In Jackrabbit Oak, we have a different (much, much faster) approach to do garbage collection, but there is no plan to backport that to Jackrabbit 2.x. The approach is: scan the repository (not traverse, but do a low level scan in the persistent storage) for blob ids. Then get the list of blobs from the data store, and delete those that are not in the list of blob ids in use.
This is much faster mainly because of two things: first, (and most importantly) it avoids random access reads (the primary key in Jackrabbit 2.x for the nodes is randomly distributed; this is no longer the case for the default storage engines in Jackrabbit Oak). Second, it avoid marking all binaries that are still in use. You could implement this for Jackrabbit 2.x, or you could switch to Jackrabbit Oak. Regards, Thomas On 16/09/14 13:50, "uv" <[email protected]> wrote: >Hi, > >our system uses jackrabbit 2.6.5 and MySQL DB datastore. Jackrabbit DB >schema size is 300GB, most of it is in datastore. When we run jackrabbit >garbage collector, it runs almost 3 days. Running GC has significant >impact >on application performance. > >Could you please advice what possibility we have? > >Somehow spit GC to do not iterate through whole datastore? When GC is not >finished completely, we can not run datastore clean because we can not be >sure what has been scanned and what has not. > >Or is there any other GC implementation? > > >Thank you very much. > >Vlastimil > > > >-- >View this message in context: >http://jackrabbit.510166.n4.nabble.com/Jackrabbit-GC-on-huge-MySQL-Databas >e-tp4661381.html >Sent from the Jackrabbit - Dev mailing list archive at Nabble.com.
