I don't know whether these are necessary yet... the first problem solved by the first optimisation may be a largish problem...
1. pendingKeys. =============== Two problems: A) When we register a big splitfile, it takes ages and goes into Full GC constantly for many minutes. Whereas trunk doesn't do this. This is not acceptable if freenet is in the background and a game is in the foreground! B) Every time we fetch a block we need to do a database lookup (tripPendingKeys()), and hence disk I/O. Bloom filters: - User chooses maximum download queue size. - Global bloom filter, in RAM, 8MB for 100GB maximum download queue size. - Each download has an 8MB bloom filter (maybe sparsely encoded), on disk only, used when removing a download to recalculate the rest. - Each download has its own bloom filter, 19 bits per key (0.01% false positives), different hash. This is in RAM. Expect 16MB total for 100GB of downloads (say 25x4GB ISOs). - So when we get a key, we check the main filter. If that's positive, we iterate all the per-download filters and check them. For each one that matches, we iterate its keys (it may have per-segment bloom filters on disk to speed this up). - Registration of splitfile downloads can be greatly speeded up: we need to compute a couple of bloom filters in RAM and then save them, but that's it. - Open question: if we generate the filter and then don't change it, what will the impact of blocks we want but have going in and out of the datastore? - Open question: Cost of removal of a big download. 2. Constant disk I/O from queued downloads ========================================== The two big causes of database I/O on an otherwise idle node with lots of hard-to-find downloads queued are request selection and retries. Request selection: We choose a request from the queue structure, remove it from the queue structure, create a PersistentChosenRequest, store that, and add it to the queue of requests to run. We could eliminate the write part of this, and hope that the read part is in RAM. Naive approach: use a hashmap in RAM to exclude stuff we're already fetching (we sort of do this already). Don't remove the requests from the queue structure (SendableRequest's). Don't store PersistentChosenRequest's. Problems with this: tend to load and exclude lots of stuff: increases disk reads. Better approach: drain a single SendableGet (which we do now), then don't visit it again unless more has been added to it (which we can tell by a generation number or something). For inserts we will probably have to remove them and update the chosen-from object. Note that request selection is probably the more efficient of the two parts anyway... since we fetch up to 50 blocks from a single SplitFileFetcherSubSegment, in one transaction we would commit the subsegment and the 50 PersistentChosenRequests. For a retry, we need to commit the segment (with an incremented retries count for the specific block), the old subsegment and the new subsegment (both objects containing a large number of blocks in an array of block numbers). db4o is read-committed (unless you use lazy query eval, which is dangerous), so we can't just only commit every so often. Although... if we are selecting segment-wise, maybe we could retry segment-wise as well? Select a whole segment. All the requests are executed (subject to higher priority non-persistent requests getting the slots when they need them.. but eventually they are all executed, or a timeout occurs). In the meantime, request selection is aware of having selected this segment. It doesn't choose it again unless it has to. If the queue length goes below some value, and no other segments are available at the same priority/retry count level, we call the timeout callback prematurely. We get a callback when all the blocks have failed. (We'd get a different callback if only some of them had failed, and there was e.g. a timeout). We delete the old subsegment, create the new one, move all the block numbers across, and commit. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: <https://emu.freenetproject.org/pipermail/devl/attachments/20080718/ba8390db/attachment.pgp>