same here; I'd go with C option :) Tommaso 2011/10/26 Daniel Spicar <dspi...@apache.org>
> the JIRA issue can be found here: > https://issues.apache.org/jira/browse/CLEREZZA-643 > > On Wed, Oct 26, 2011 at 3:36 PM, Daniel Spicar <dspi...@apache.org> wrote: > > > Rupert provided a patch to improve serialization performance (thanks for > > the effort!). I reviewed his Patch and have written my comments on the > JIRA > > page. But I think we need to discuss the issues I raise there. In > summary: > > > > - neither the patch nor the current implementations work reliably with > very > > large graphs (larger than memeory) > > - the patch is significantly faster than the current implementation > > - the current implementation is easier to quick-fix for very large graphs > > (but also very slow) > > > > There is a sketch of a better solution that should allow us to be faster > > and not limited by memory size. It is based on sorted iterators. However > > these iterators need to be supplied by the underlying TripleCollections > and > > that will require more changes to the core of Clerezza. > > > > Because both, the current implementation and the patch doe not really > work > > on "big" TripleCollection (when big means really really big) the question > we > > should discuss its: > > a) keep everything as it is and solve the problem properly (possibly as > > described in the issue) > > b) quick fix the current implementation (slow performance) + schedule a > > proper solution > > c) apply the patch (fast but graphs limited to available memory size) + > > schedule a proper solution > > > > My favorite is c. > > > > What do you think? > > >