the JIRA issue can be found here: https://issues.apache.org/jira/browse/CLEREZZA-643
On Wed, Oct 26, 2011 at 3:36 PM, Daniel Spicar <dspi...@apache.org> wrote: > Rupert provided a patch to improve serialization performance (thanks for > the effort!). I reviewed his Patch and have written my comments on the JIRA > page. But I think we need to discuss the issues I raise there. In summary: > > - neither the patch nor the current implementations work reliably with very > large graphs (larger than memeory) > - the patch is significantly faster than the current implementation > - the current implementation is easier to quick-fix for very large graphs > (but also very slow) > > There is a sketch of a better solution that should allow us to be faster > and not limited by memory size. It is based on sorted iterators. However > these iterators need to be supplied by the underlying TripleCollections and > that will require more changes to the core of Clerezza. > > Because both, the current implementation and the patch doe not really work > on "big" TripleCollection (when big means really really big) the question we > should discuss its: > a) keep everything as it is and solve the problem properly (possibly as > described in the issue) > b) quick fix the current implementation (slow performance) + schedule a > proper solution > c) apply the patch (fast but graphs limited to available memory size) + > schedule a proper solution > > My favorite is c. > > What do you think? >