I'm sorry I didn't look at this earlier. There is one serious architectural problem preventing us merging this (which otherwise is great stuff that will save us a significant amount of disk I/O and disk space even when not filtering):
WE MUST NOT ACCESS THE DATABASE ON ANY OTHER THREAD THAN THE DATABASE THREAD! This is a basic rule of the current client layer. Changing it would be massively non-trivial given activation issues etc, and as db4o uses separate object caches for each parallel client, might not boost performance as much as you might think. In the SingleFileFetcher case we can deal with this easily enough: Just call onSuccesss(result.asBucket().getInputStream()). In the SplitFileFetcher case this is way more complex. We have two options: 1. Reverse the flow: Run the writeDecodedData()'s on the database thread, passing it in as a Generator object with a writeTo(OutputStream, container, context) method. Create a pair of pipes, feed the input half to the decompression thread manager, and call writeTo(), on the database thread, to write to the output. This implies that hash checking, filtering and in particular writing the final data to a bucket, occurs on a separate thread, at the end of the decompression chain, so we would need another pair of pipes on the other end of the decompression thread manager. 2. Don't reverse the flow: The simplest implementation would be to create a shadow bucket (getShadow(), a non-persistent bucket with the same contents) for every block to be copied, and then dump them out on a separate thread as now. The problem is this could take a lot of memory, so for larger files we would have to schedule a database job to grab the next group of shadow buckets every so often. IMHO the first strategy is probably better: Create a pair of pipes at the front, and a pair of pipes at the back. The back end runs hashing, filtering and writing to the final bucket (which can be persistent, you can pass an OutputStream for a persistent bucket to another thread as long as you ensure it's not deactivated in the meantime). The front end is a writeTo() called on the database thread. The front end feeds into the middle (decompressors) which then feed into the back end. Once writeTo() returns we wait for the whole thing to finish. My summary of the changes so far: Reviewed up to 2c18269615cf51c24f1826ee8ee1e59caca730af sajack/ContentFilterDecompressor: - Pass a List of Compressor's into ClientGetter.onSuccess(). Do the decompression there. - DecompressorThreadManager: Pipe everything (decompression, possibly multiple counts) on separate threads, and return an InputStream. Hence chain final data bucket -> decompressor -> any more decompressors -> filtering -> hash. - Remove decompress(bucket, bucketfactory, ...) API. Always use stream to stream API. - Simplify unit tests slightly. - Use max length when creating final bucket. - Cleanups. - Pass an InputStream directly into ClientGetter and thus chain the final segment reassembly to everything else. - Remove some unnecessary checks, let JPEG filter throw EOF if it needs to. 9be670dcd1f7a76acce49de844e95135cc4bd5f5 - another old bug, why do we pass maxLen * 4 ? is this max read?? do we actually check written length here? - same old code is duplicated in archive fetcher callback, probably multi level meta fetcher callback too - chaining to onFailure()'s - should return - archive fetcher callback: Bucket orig = result.asBucket - can move out of loop, but doesn't matter? same for multi-level - archive fetcher callback: should return after calling onFailure. same for multi-level. - SplitFileFetcher.finalStatus: we could still use returnBucket if we are not compressed... - SplitFileFetcher.finish: Don't run onFailure then onSuccess 089d7dc4e6ddb518062da2e29287a400f9604a22 DecompressorThreadManager - maxLen * 4 still there, what's it for? - Fix exception handling a31064ef1ff74ad5e94273089df2e1321034422a - ClientGetter exception handling - ClientGetter: fix the maxlen calculation? can we pass in temp length for each temp and output length for the last one? a31064ef1ff74ad5e94273089df2e1321034422a - *better* exception handling! ioe and compressionsizeexception are expected, and they have specific error codes to handle them 33c8bfc3d17e3eaf6a808f2b097b5f2be5910656 - DecompressorThread: null out input and output on successful close e27a872923c04e4966e80afdaffc17b613c2067e - bzipcompressor, gzipcompressor: bytes is unnecessary, isn't it? 44e93f1dec58d541dd2920d646311c9f94262555 - throw getError() inside catch which doesn't throw? doesn't make much sense... e9e118051d3ef3d46be83f53a65fe83ff98de4d6 - SimpleSingleFileFetcher.onSuccess broken - MAJOR PERSISTENCE BUG: finalStatus() writer *CANNOT RUN ON ANOTHER THREAD BECAUSE IT ACCESSES THE DATABASE*. We need to pass in a closure or similar - run the generator on the database thread, run everything else, up to and including writing to the final bucket, on other threads. So the architecture is backwards at the moment. Or we could have it run forwards as it does now, but we'd need to return a chain of shadow buckets, and dump them to the first POS off-thread, but periodically come back for more. - Also exception handling. 6fcbbdad4cfdb787d98d2370cd84d59ff8401c49 - can get deactivated, use a shadow bucket and it will work fine dbf48b90872bd4103d120d84290744ac9c5cd95f - wrong, we need to free the data as early as possible -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part. URL: <https://emu.freenetproject.org/pipermail/devl/attachments/20100728/8a51fadf/attachment.pgp>
