[freenet-dev] Library mega-code-review

Ximin Luo Sat, 15 Aug 2009 11:50:52 +0100

Matthew Toseland wrote:
> 7f356226e4a0a175fcf5ccb16f36059327f352b6
> 
> MapCollectionTranslator assumes backing is what later becomes a SortedMapSet 
> - a map of elements each to itself. should maybe be private?


uh, don't see the need... it's a static method, you pass it arguments and it
holds no state?

> 570872c82603d75e1922d061c51fd77f0d8d7183
> 
> CAPITALS should really only be used for constants (this commit is gone but 
> other places do this).

ah, ok.. they are constant for the instance, but i see what you mean... i'll
start avoiding this. but it shouldn't be confusing, object.CONST is obviously
different from Class.CONST..

> bd50ba2b048e153d34893dc0384a1b63d41ee6a8
> 
> is this wise? it will be different for every SSK under the same routing key, 
> as the routingkey is constructed from the pubkey hash and e(h(docname)) ... 
> otoh it will work for KSKs, unlike the old code.

will have to see how in performs... the whole discussion we had before about
this is pasted into todo.txt.

> 740498979f511e4aafd710a448dae20586336304
> 
> containsValue() broken - the value will not equal the SkeletonValue.

ah yes, i'll fix it.

> 74603b79fbce24725751cdfbfe7bcd15df704213
> 
> TermPageEntry: will the title be used? It says "here for back compatibility"??

yes, URIWrapper had it. the original plan was to store such data in the URI
table, in a URIEntry, but we haven't thought it out fully yet.

> bcb80c457089f2114c0b31e85127c0b379786533
> 
> Breadth-first parallel inflate: this will likely make a lot of threads! I 
> think this may have been made pooled later?

yes.

> Also you need to use NativeThread's to set priority class, which means you 
> probably need to use the node's executor (to increase priority)... we can 
> deal with this later if need be.

uh... i don't entirely understand, will ask in channel.

> 4000b1b88d14c1eda47e18fc45283c3d9a561a52
> 
> ThreadPoolExecutor may be a problem because of NativeThread, not an 
> immediately urgent issue though. We could use a ThreadFactory to solve this.


> 9a1855c7ba183a38ccd19f273a4fdb32edff7a83
> 
> Can we be certain that a freenet index can never cause us to search a file 
> index?

Are you talking about searchboxes embedded into a freesite? (If not, please
elaborate.) Securing that would be the Library's job. I suggest having two
Map<FreenetURI, Index> and Map<File, Index> rather than the Map<String, Index>
we have now..

> 5394f13e60ca206370aeba722db261a897436fb2
> 
> FreenetArchiver.pushLive():
> note that the bucket will in some cases be limited to expected_bytes, so you 
> should pass in a big number if you are not sure!

Really? I thought TemporaryBuckets expanded themselves as necessary? I gave it
an expansion factor of 2...

> b880788bbe1f6966c0d887d369bfc840d602c39a
> 
> Library: singletons: seems to make sense, what about a high level regression 
> test using multiple nodes and a Library plugin on each? Maybe far fetched... 
> I have heard you can do this with classloader tricks, maybe that's better... 
> Not a problem if these objects are reusable and don't have any significant 
> state...?

hmm, i would have thought you'd be running those nodes in separate java 
processes?

> b5a597cfd57dbbb1eebf1eefebc5fac4cafc8cbf
> 
> ProtoIndexSerialiser:
> Can we be sure here that we never leave the insert ID in the metadata and it 
> get serialised? It should be external, no? Or at least, it should be turned 
> from an insert uri into a request uri?

the metadata is only used here to determine the insert address, and is actually
discarded along with the task after it's done. i'll make a note of this in the
javadoc for the Translator, though.

> 8702a6745fee793fe77af537b7c347adadcd2fc0
> 
> FreenetArchiver:
> Fetch can go over 100% on newly inserted data??? If so this is a bug, but it 
> is prudent to detect and handle it. Certainly it can on old data.

ok, adding a logging message for this.

> 8702a6745fee793fe77af537b7c347adadcd2fc0
> 
> SimpleProgress:
> +     @Override public ProgressParts getParts() throws TaskAbortException {
> +             // updates are made such that the ProgressParts contract isn't 
> broken even
> +             // in mid-update so we don't need to synchronize here.
> +             if (abort != null) { throw abort; }
> +             return new ProgressParts(pdone, known, known, estimate);
>       }
> 
> Are you sure? Write order is not guaranteed if we don't synchronize.

(for benefit of devl@) they are volatile, so ok

> cafcea9db4c0007e9153eaa93bf4cebca450a07d
> 
> Library:
> +
> +     public Object getKeyTypeFromString(String indexuri) {
> +             try {
> +                     return KeyExplorerUtils.sanitizeURI(new 
> ArrayList<String>(), indexuri);
> +             } catch (MalformedURLException e) {
> +                     File file = new File(indexuri);
> +                     if (file.canRead()) {
> +                             return file;
> +                     } else {
> +                             throw new UnsupportedOperationException("Could 
> not recognise index type from string: " + indexuri);
> +                     }
> +             }
>       }
>  
> 
> Confusing function name.

renaming to getAddressTypeFromString

> cd058a3293e037da9428091ded0b09aa712c5e1e
> 
> ProgressParts:
>       /**
>       ** Returns the parts done as a fraction of the estimated total parts.
>       ** Note: this will return negative (ie. invalid) if there is no 
> estimate.
>       */
>       final public float getEstimatedFractionDone() {
> -             return totalest == 0? 0.0f: (float)done / totalest;
> +             return totalest == 0? 0.0f: totalest == TOTAL_FINALIZED? 1.0f: 
> (float)done / totalest;
>       }
>  
> Huh???? These don't make sense - finalized is surely not the same as done?

argh yes, fixing, should be getKnownFractionDone() instead of 1.0f and same for
Started.

>       public static ProgressParts getSubParts(Iterable<? extends Progress> 
> subprogress, boolean try_to_be_smart) throws TaskAbortException {
>               int d = 0, s = 0, k = 0, t = 0;
>               int num = 0, unknown = 0;
> +             boolean totalfinalized = true;
>               for (Progress p: subprogress) {
>                       ++num;
>                       if (p == null) { ++unknown; continue; }
>                       ProgressParts parts = p.getParts();
>                       d += parts.done;
>                       s += parts.started;
>                       k += parts.known;
> -                     if (parts.totalest < 0) { ++unknown; }
> -                     else { t += parts.totalest; }
> +                     switch (parts.totalest) {
> +                     case ESTIMATE_UNKNOWN:
> +                             ++unknown;
> +                             totalfinalized = false;
> +                             break;
> +                     case TOTAL_FINALIZED:
> +                             t += parts.totalest;
> 
> This can't possibly be right. += -1 ??
> 
> +                             break;
> +                     default:
> +                             totalfinalized = false;
> +                             t += parts.totalest;
> +                     }
>               }
> 

ah yes, should be parts.known. thanks.

> 66a26efb893d9491f72de59e5f3730933ebfd12c
> 
> If it completes, check the MIME type on the FetchResult before throwing.

kk, adding

> Also you should consider accepting an index on the basis of its default 
> filename fetching successfully even if it is inserted as 
> application/octet-stream? I dunno, you are expecting the new spider to insert 
> indexes itself? I guess it's more important that we detect the filename if it 
> is specified, and encourage users to specify it.

yeah, this is a OPTIMISE notice in getAddressType, i'll add PRIORITY.

> Also, SnoopMetadata allows you to do equivalent things (directory listing) to 
> KeyExplorer without depending on a plugin; obviously this isn't a priority.
> 

> Changelog (rather schizophrenic, you don't have to read it, I was reviewing 
> the code anyway so; it has gone in the tag so you might want to look at it):
> 
> Library:
> - Base on XMLLibrarian up to 3ed7453c0f70a61e21839bb5c0b4d11aab6c5c69 *AND* 
> Interdex at rev 0860091a26323360ff1a77c9d48077fc17b0d3e2. (minor build.xml 
> conflicts).
> - Rebrand to Library. This will have UI and FCP for both old and new 
> (interdex) indexes. Move from src/plugins/XMLLibrarian/ to 
> src/plugins/Library/ and update build.xml (in some places using 
> $(packagebase)). Move some packages around, e.g. Interdex into 
> index/Interdex, some stuff into util or search, xmlindex/ into index/xml.
> - findTerm(), getIndex(), will presumably be exposed to FCP.
> - Delete some unused code.
> - Experiments with RPC-over-FCP code, all gone now.
> - Add GPL headers.
> - Javadocs: packages.
> - Javadocs.
> - Comments.
> - Move plugin stuff to Main classs. Library for index stuff.
> - Logging.
> - Index interface.
> - WritableIndex interface.
> - Move some older Index stuff into XMLIndex, make it implement Index.
> - Generics.
> - Factor out AbstractRequest, base of FinndRequest.
> - Simplify build-dep instructions.
> - Move Interdex stuff to index/, or util/ or serial/, delete duplicated or 
> irrelevant files, move many files around.
> - Indenting.
> - Deprecate PrefixTreeMap.
> - Try to merge the progress (Request) interfaces from both plugins. This 
> affects a remarkable amount of code, involving much refactoring. delete the 
> ERROR completion code, will throw in getResult().
> - TODO file editing.
> - bookmarks: bookmark:name(uri) bookmarks an index.
> - Library.addBookmark(). Add one by searching ~name(url).
> - catch RuntimeException's and display in the web UI.
> - ProtoIndex.getTermEntries.getResult() returns an immutable collection.
> - change serialised representation of btree root.
> - Style changes, centralising style.
> - better ui for searching multiple indexes, showing which you are using and 
> showing checkboxes for ones you could use.
> - build.xml: fetch SnakeYAML and verify checksum.
> - *Packer -> *SplitPacker. Deprecate *SplitPacker.
> - New Packer: abstract class, a few methods must be filled in, but provides 
> configurable packing algorithm. Best fit decreasing with some options. Can 
> fit stuff into bins even if they already have data in them. Can repack. Pulls 
> data in to repack where necessary. Old Packer would split items, this greatly 
> complicates matters; new Packer does not.
> - Toadlet interface for search. Searches can only be started from POSTs. 
> POSTs require the formPassword, thus neutralising any threat from malicious 
> external sites. Lots of bug chasing...
> - BTreeSet. SkeletonBTreeSet.
> - Interdex: error handling work.
> - Interdex: Where a BTree has the data internally, there are no external 
> dependancies (as opposed to e.g. btree roots packed into a series of 
> buckets). which means that serialisation is tricky, because we expect to push 
> the dependancies as ghosts, at which point isBare() will be true and we can 
> push the node itself. If there is only internal data, copy it to the metadata 
> so we can be isBare() and then copy it when we serialise the node. In future 
> there will be a more efficient solution.
> - Interdex: error checking.
> - Interdex: put TokenEntry's within a keyword into a BTreeSet, not a TreeSet. 
> Pack the roots rather than the whole sets.
> - Interdex: BinInfo, no longer an Object[2].
> - Interdex: many corner cases.
> - Interdex: abstract out SortedMapSet from BTreeSet, implements subSet etc, 
> cleans up code.
> - Interdex: Fix BTreeMap iterator hasNext().
> - Interdex: Extend unit tests.
> - Interdex: Scale knows its Packer.
> - Interdex: delete old code.
> - Interdex: progress tracking support in Packer. Reinstate progress tracking 
> code, tweak job names for logging purposes.
> - Interdex: fix some inflate bugs.
> - Interdex: fix packer bugs related to completion.
> - Interdex: support accessing by index, cache size of tree below each node 
> (_size or treeSize()), rename size() to nodeSize().
> - Interdex: Get routing key from the NodeKey for URIKey's.
> - Interdex: implement utab (uri -> uri entries). Implement translators and 
> packers for utab and ttab, BIndexSerialiser should now be functional.
> - Interdex: make packer more generic/configurable, Inventory contains most 
> functions of Scale and proxies actual weighing to Scale
> - Interdex: rename MAGIC to serialVersionUID.
> - Interdex: proper hashCode, equals, etc for Token*Entry.
> - Interdex: rename TokenEntry -> TermEntry, TokenURIEntry -> TermPageEntry.
> - Interdex: set dumper options so dumps are readable for debugging purposes.
> - Interdex: custom yaml format for BinInfo.
> - Interdex: new SkeletonTreeMap, uses a wrapper for values with loaded status 
> and metadata, cleaner code, better iterators etc, in particular support for 
> iterating over unloaded data and it throws when you try to get the values, 
> partial support for subMap/headMap/tailMap, can load data while iterating 
> (will throw, caller can then load on the map, then go back to iterator), 
> fixes some bugs.
> - Interdex: generics.
> - Remove progress trackables in ProtoIndex, replace with a stack of 
> trackables, show progress while getting the root
> - SkeletonBTreeMap: deflate/inflate on right hand side not left hand side.
> - Interdex: Lots of minor refactoring.
> - Don't depend on FredPluginAPI. It is unlikely to work any time soon, we 
> will use FCP instead.
> - Eliminate URIWrapper in favour of TermPageEntry, which now includes title. 
> Remove code related to uri being a string on URIWrapper, since it's a 
> FreenetURI on TermPageEntry.
> - Sanity checking.
> - Interdex: refactor makeProgressIterable etc.
> - CompositeRequest interface for requests with subrequests.
> - TermEntry.equalsTarget(), rel.combine().
> - Rewrite/refactor ResultSet to handle TermEntry's better, move combination 
> code out of Search. No phrase search yet, was turned off anyway. Is 
> immutable, operations happen in constructor. Split DIFFERENTINDEXES mode out 
> of UNION, currently handled the same, add SINGLE mode. Intersection may be 
> slightly more efficient.
> - Refactor web interface for TermEntry's and CompositeRequest.
> - Delete some old code.
> - Combine relevances in Term*Entry, not in TermEntry, and make TermEntry 
> immutable, create a new one in combine().
> - More refactoring, optimising merging modes in ResultSet.
> - Working phrase search.
> - Re-enable positions support in XML index code. Only use it if we need to.
> - XML index: Calculate a simple relevance score based on word count and file 
> count.
> - Fix comparisons (left/right/middle, different null handling in each) in 
> btreemap, order Node's for breadth-first search.
> - Breadth-first search for inflation. ****** likely to use a lot of threads...
> - AbstractRequest: record time taken.
> - Interdex: TaskAbortException thrown some places, setAbort -> abort (which 
> throws it after setting it).
> - Interdex: Improve progress text.
> - Use a ThreadPoolExecutor in ParallelSerialiser. TaskHandler handles tasks 
> on a thread (in serial mode), shifts them to the executor if they are already 
> running, keeps progress data up to date; in parallel mode, run directly on 
> executor (caller must create progress).
> - Interdex: ProgressTracker now uses a WeakHashMap, of tasks (not data). 
> get*Progress just gets the progress, add*Progress throws a 
> TaskInProgressException if the task is already running otherwise adds it.
> - Interdex: Parallel inflation on BTreeMap.
> - Interdex: Various refactoring.
> - Remove wanna19 from default indexes.
> - Remove FredPluginHTTP interface.
> - Refactor web interface, new class ResultNodeGenerator, delete WebUI. 
> ResultNodeGenerator splits up results, will properly handle TermTermEntry's 
> and TermIndexEntry's, and less code duplication than old code.
> - Split YamlArchiver into FileArchiver and YamlReaderWriter, use 
> ObjectStreamReader/Writer (simpler than ObjectInput/Output) interfaces.
> - Freenet archiver. Initially write-only.
> - Web interface access to Tester, on-freenet tests.
> - Tester creates a FreenetArchiver, puts a load of test data, and pushes the 
> data, showing progress; progress test tests a map of ints to ints, push test 
> tests a smaller but more complex map of string to bunch of fake CHKs.
> - SimpleProgress changes: track parts done, parts known, estimated parts, 
> last status string, addTotal -> addPartKnown. if estimated parts == total, 
> then isTotalFinal() is true. enteredSerialiser/exitingSerialiser must be 
> called. 
> - Imports.
> - Option to group USKs. Sort by relevance either way.
> - Show title (tooltip) of links as the url for the direct link and the [ USK 
> ].
> - Show relevance in tooltip for the main link.
> - Put search toadlet on the Browse Freenet menu.
> - Refetch on demand after a failure.
> - Make sure the search is removed after a TaskAbortException.
> - Factor out combine() from TermEntry into a method in ResultSet.
> - Remove WebUI references, reinstate FredPluginHTTP. **** merge fix hack
> - CSS fixes.
> - Detect when trying to do a phase search on an index with no position info.
> - Only show the full error trace with a big box if the error is other than an 
> InvalidSearchException.
> - Automatically determine index types by partially fetching the URI. 
> Initially implemented using KeyExplorer, detect via either mime type or 
> existence of the base filename for the index type. This is dubious, SEE BELOW.
> - UI for adding an index.
> - POST password check works a different way, can do a search from a POST but 
> not add an index.
> - Use L10n (even though it doesn't work).
> - Singleton-ise Library.
> - Split BIndexSerialiser into ProtoIndexComponentSerialiser and 
> ProtoIndexSerialiser.
> - Static map of one ProtoIndexComponentSerialiser per file format. Make many 
> members non-static. ProtoIndexSerialiser handles the overall ProtoIndex, 
> ProtoIndexComponentSerialiser helps to serialise the individual maps.
> - Add a testing yaml index to the test for the auto-detect code.
> - Pass the insert or fetch URI in to a push task / pull task via the 
> metadata, just as we pass in the filenames.
> - Remove most DataFormatException constructors to avoid misuse.
> - Static singleton ProtoIndexSerialiser in Library.
> - Pass a FreenetURI into getIndexType, not a String.
> - Wire in support for yaml indexes in Library.
> - Yaml indexes support for pulling from Freenet
> - Catch exceptions from getIndexType in parent method getIndex.
> - Server-side state based authorisation/confirmation scheme for adding 
> bookmarks (this makes sense because we can add a bookmark by changing the 
> name of the index).
> - Progress: ProgressParts object contains done, total, known, can presumably 
> be got atomically so avoids races. Remove RequestState from AbstractRequest. 
> Lots of related work. Request extends ChainedProgress, remove unneeded 
> methods. Eliminate AtomicProgress. New class BaseCompositeProgress, base 
> composite progress tracker for parallel tasks. ChainedProgress: series of 
> tasks.
> - BaseCompositeProgress.join() can see nulls because it is fed an iterable 
> over the tasks, which then looks up the progress's, which can be null before 
> the progress has started. After it has finished it will however still be 
> present as the iterator has a strong reference to the progress, and the 
> weakhashmap is weak on the keys - the tasks. Updates to SimpleProgress, lots 
> of code.
> - Various minor refactoring related to Progress, use the Progress interfaces 
> where possible, track what is needed for them in e.g. FindRequest.
> - More work on progress, refactoring.
> - System.currentTimeMillis, not new Date().getTime(). 
> - Always get the termpositions if only for relevance.
> - Avoid NPE displaying search page, show failures better.
> - Better auto-detect of URI vs filename for indexes.
> - Index type test: test the SSK URIs, not the USKs.
> - Refactor: ProtoIndexSerialiser.forIndex() not new ProtoIndexSerialiser(), 
> have a static map of serialisers by class, change 
> FreenetArchiver/FileArchiver parameters, minor constants refactoring, minor 
> refactoring in FileArchiver.
> - Bugfix in ParallelSerialiser.isActive(), no idea consequences, probably 
> none. Bugfix in SkeletonNode.rev().
> - Handle over 100% in splitfile progress better in Interdex index fetching.
> - Another fix to FreenetArchiver progress from splitfile code (derive 
> percentage from minSuccessful not total).
> - getTermEntriesHandler: combine metas and trackers into a LinkedHashMap, 
> track current tracker and meta.
> - Refactor Progress estimate type. UNKNOWN or FINALIZED (= known, but only 
> this enables finalizedTotal() and thus completion), or a specific value. 
> Finalized total must be set before the request completes. This is often done 
> automatically by SimpleProgress.addPartKnown(,true). Set total finalised in 
> other places.
> - Rename rootSize -> sizeRoot, add nodeMin, entMax, heightEstimate to BTree*.
> - Progress with number of btree levels lookup estimate on 
> getTermEntriesHandler.
> - Less efficient index type detection not using KeyExplorer (it was having to 
> download a non-staging binary, IMHO this was bad). Based on MIME type, 
> fetches [key]/, [key]/index.xml, [key]/index.yml.
> - Refactor FindRequest (XML index progress), use an enum not a string for 
> stages, keep an internal Progress impl up to date for the current stage, use 
> the index of the enum for the current stage to set the stage for 
> ProgressParts purposes. Use the enums when setting progress externally.
> - Handle failure with a new URI - USK@ updates, not enough path components 
> etc - for XML indexes.
> - Logging.
> - Improve progress bars a little, fix them for new Progress.
> - More CSS changes for progress bar.
> - Minor fixes in web interface, better error catching/handling/displaying.
> - Refactor web interface, now there is only MainPage.
> - Beginnings of Interdex distributed search implementation. Interdex, 
> InterdexQuery, IndexQuery. Document algorithms a bit more in comments.
> - Make fields private or final as needed.
> - AbstractRequest: implement join().
> - Web interface bugfix adding bookmarks.
> - Style fixes, use style in POST handler.
> - Search progress: don't return as composites, stick it all together, unless 
> it is different indexes being fetched. Probably looks better on search 
> progress. Implement status functions.
> - Fix some NPEs, simpler progress bar.
> - Synchronization in parsing.
> - Temporary hack, put toadlet on /plugins/plugin.Library.FreesiteSearch - we 
> need to let /library/ through the filter.
> 
> 
> platy
> infinity0
> sdiz

[freenet-dev] Library mega-code-review

Reply via email to