Matthew Toseland wrote:
> 7f356226e4a0a175fcf5ccb16f36059327f352b6
>
> MapCollectionTranslator assumes backing is what later becomes a SortedMapSet
> - a map of elements each to itself. should maybe be private?
uh, don't see the need... it's a static method, you pass it arguments and it
holds no state?
> 570872c82603d75e1922d061c51fd77f0d8d7183
>
> CAPITALS should really only be used for constants (this commit is gone but
> other places do this).
ah, ok.. they are constant for the instance, but i see what you mean... i'll
start avoiding this. but it shouldn't be confusing, object.CONST is obviously
different from Class.CONST..
> bd50ba2b048e153d34893dc0384a1b63d41ee6a8
>
> is this wise? it will be different for every SSK under the same routing key,
> as the routingkey is constructed from the pubkey hash and e(h(docname)) ...
> otoh it will work for KSKs, unlike the old code.
will have to see how in performs... the whole discussion we had before about
this is pasted into todo.txt.
> 740498979f511e4aafd710a448dae20586336304
>
> containsValue() broken - the value will not equal the SkeletonValue.
ah yes, i'll fix it.
> 74603b79fbce24725751cdfbfe7bcd15df704213
>
> TermPageEntry: will the title be used? It says "here for back compatibility"??
yes, URIWrapper had it. the original plan was to store such data in the URI
table, in a URIEntry, but we haven't thought it out fully yet.
> bcb80c457089f2114c0b31e85127c0b379786533
>
> Breadth-first parallel inflate: this will likely make a lot of threads! I
> think this may have been made pooled later?
yes.
> Also you need to use NativeThread's to set priority class, which means you
> probably need to use the node's executor (to increase priority)... we can
> deal with this later if need be.
uh... i don't entirely understand, will ask in channel.
> 4000b1b88d14c1eda47e18fc45283c3d9a561a52
>
> ThreadPoolExecutor may be a problem because of NativeThread, not an
> immediately urgent issue though. We could use a ThreadFactory to solve this.
> 9a1855c7ba183a38ccd19f273a4fdb32edff7a83
>
> Can we be certain that a freenet index can never cause us to search a file
> index?
Are you talking about searchboxes embedded into a freesite? (If not, please
elaborate.) Securing that would be the Library's job. I suggest having two
Map<FreenetURI, Index> and Map<File, Index> rather than the Map<String, Index>
we have now..
> 5394f13e60ca206370aeba722db261a897436fb2
>
> FreenetArchiver.pushLive():
> note that the bucket will in some cases be limited to expected_bytes, so you
> should pass in a big number if you are not sure!
Really? I thought TemporaryBuckets expanded themselves as necessary? I gave it
an expansion factor of 2...
> b880788bbe1f6966c0d887d369bfc840d602c39a
>
> Library: singletons: seems to make sense, what about a high level regression
> test using multiple nodes and a Library plugin on each? Maybe far fetched...
> I have heard you can do this with classloader tricks, maybe that's better...
> Not a problem if these objects are reusable and don't have any significant
> state...?
hmm, i would have thought you'd be running those nodes in separate java
processes?
> b5a597cfd57dbbb1eebf1eefebc5fac4cafc8cbf
>
> ProtoIndexSerialiser:
> Can we be sure here that we never leave the insert ID in the metadata and it
> get serialised? It should be external, no? Or at least, it should be turned
> from an insert uri into a request uri?
the metadata is only used here to determine the insert address, and is actually
discarded along with the task after it's done. i'll make a note of this in the
javadoc for the Translator, though.
> 8702a6745fee793fe77af537b7c347adadcd2fc0
>
> FreenetArchiver:
> Fetch can go over 100% on newly inserted data??? If so this is a bug, but it
> is prudent to detect and handle it. Certainly it can on old data.
ok, adding a logging message for this.
> 8702a6745fee793fe77af537b7c347adadcd2fc0
>
> SimpleProgress:
> + @Override public ProgressParts getParts() throws TaskAbortException {
> + // updates are made such that the ProgressParts contract isn't
> broken even
> + // in mid-update so we don't need to synchronize here.
> + if (abort != null) { throw abort; }
> + return new ProgressParts(pdone, known, known, estimate);
> }
>
> Are you sure? Write order is not guaranteed if we don't synchronize.
(for benefit of devl@) they are volatile, so ok
> cafcea9db4c0007e9153eaa93bf4cebca450a07d
>
> Library:
> +
> + public Object getKeyTypeFromString(String indexuri) {
> + try {
> + return KeyExplorerUtils.sanitizeURI(new
> ArrayList<String>(), indexuri);
> + } catch (MalformedURLException e) {
> + File file = new File(indexuri);
> + if (file.canRead()) {
> + return file;
> + } else {
> + throw new UnsupportedOperationException("Could
> not recognise index type from string: " + indexuri);
> + }
> + }
> }
>
>
> Confusing function name.
renaming to getAddressTypeFromString
> cd058a3293e037da9428091ded0b09aa712c5e1e
>
> ProgressParts:
> /**
> ** Returns the parts done as a fraction of the estimated total parts.
> ** Note: this will return negative (ie. invalid) if there is no
> estimate.
> */
> final public float getEstimatedFractionDone() {
> - return totalest == 0? 0.0f: (float)done / totalest;
> + return totalest == 0? 0.0f: totalest == TOTAL_FINALIZED? 1.0f:
> (float)done / totalest;
> }
>
> Huh???? These don't make sense - finalized is surely not the same as done?
argh yes, fixing, should be getKnownFractionDone() instead of 1.0f and same for
Started.
> public static ProgressParts getSubParts(Iterable<? extends Progress>
> subprogress, boolean try_to_be_smart) throws TaskAbortException {
> int d = 0, s = 0, k = 0, t = 0;
> int num = 0, unknown = 0;
> + boolean totalfinalized = true;
> for (Progress p: subprogress) {
> ++num;
> if (p == null) { ++unknown; continue; }
> ProgressParts parts = p.getParts();
> d += parts.done;
> s += parts.started;
> k += parts.known;
> - if (parts.totalest < 0) { ++unknown; }
> - else { t += parts.totalest; }
> + switch (parts.totalest) {
> + case ESTIMATE_UNKNOWN:
> + ++unknown;
> + totalfinalized = false;
> + break;
> + case TOTAL_FINALIZED:
> + t += parts.totalest;
>
> This can't possibly be right. += -1 ??
>
> + break;
> + default:
> + totalfinalized = false;
> + t += parts.totalest;
> + }
> }
>
ah yes, should be parts.known. thanks.
> 66a26efb893d9491f72de59e5f3730933ebfd12c
>
> If it completes, check the MIME type on the FetchResult before throwing.
kk, adding
> Also you should consider accepting an index on the basis of its default
> filename fetching successfully even if it is inserted as
> application/octet-stream? I dunno, you are expecting the new spider to insert
> indexes itself? I guess it's more important that we detect the filename if it
> is specified, and encourage users to specify it.
yeah, this is a OPTIMISE notice in getAddressType, i'll add PRIORITY.
> Also, SnoopMetadata allows you to do equivalent things (directory listing) to
> KeyExplorer without depending on a plugin; obviously this isn't a priority.
>
> Changelog (rather schizophrenic, you don't have to read it, I was reviewing
> the code anyway so; it has gone in the tag so you might want to look at it):
>
> Library:
> - Base on XMLLibrarian up to 3ed7453c0f70a61e21839bb5c0b4d11aab6c5c69 *AND*
> Interdex at rev 0860091a26323360ff1a77c9d48077fc17b0d3e2. (minor build.xml
> conflicts).
> - Rebrand to Library. This will have UI and FCP for both old and new
> (interdex) indexes. Move from src/plugins/XMLLibrarian/ to
> src/plugins/Library/ and update build.xml (in some places using
> $(packagebase)). Move some packages around, e.g. Interdex into
> index/Interdex, some stuff into util or search, xmlindex/ into index/xml.
> - findTerm(), getIndex(), will presumably be exposed to FCP.
> - Delete some unused code.
> - Experiments with RPC-over-FCP code, all gone now.
> - Add GPL headers.
> - Javadocs: packages.
> - Javadocs.
> - Comments.
> - Move plugin stuff to Main classs. Library for index stuff.
> - Logging.
> - Index interface.
> - WritableIndex interface.
> - Move some older Index stuff into XMLIndex, make it implement Index.
> - Generics.
> - Factor out AbstractRequest, base of FinndRequest.
> - Simplify build-dep instructions.
> - Move Interdex stuff to index/, or util/ or serial/, delete duplicated or
> irrelevant files, move many files around.
> - Indenting.
> - Deprecate PrefixTreeMap.
> - Try to merge the progress (Request) interfaces from both plugins. This
> affects a remarkable amount of code, involving much refactoring. delete the
> ERROR completion code, will throw in getResult().
> - TODO file editing.
> - bookmarks: bookmark:name(uri) bookmarks an index.
> - Library.addBookmark(). Add one by searching ~name(url).
> - catch RuntimeException's and display in the web UI.
> - ProtoIndex.getTermEntries.getResult() returns an immutable collection.
> - change serialised representation of btree root.
> - Style changes, centralising style.
> - better ui for searching multiple indexes, showing which you are using and
> showing checkboxes for ones you could use.
> - build.xml: fetch SnakeYAML and verify checksum.
> - *Packer -> *SplitPacker. Deprecate *SplitPacker.
> - New Packer: abstract class, a few methods must be filled in, but provides
> configurable packing algorithm. Best fit decreasing with some options. Can
> fit stuff into bins even if they already have data in them. Can repack. Pulls
> data in to repack where necessary. Old Packer would split items, this greatly
> complicates matters; new Packer does not.
> - Toadlet interface for search. Searches can only be started from POSTs.
> POSTs require the formPassword, thus neutralising any threat from malicious
> external sites. Lots of bug chasing...
> - BTreeSet. SkeletonBTreeSet.
> - Interdex: error handling work.
> - Interdex: Where a BTree has the data internally, there are no external
> dependancies (as opposed to e.g. btree roots packed into a series of
> buckets). which means that serialisation is tricky, because we expect to push
> the dependancies as ghosts, at which point isBare() will be true and we can
> push the node itself. If there is only internal data, copy it to the metadata
> so we can be isBare() and then copy it when we serialise the node. In future
> there will be a more efficient solution.
> - Interdex: error checking.
> - Interdex: put TokenEntry's within a keyword into a BTreeSet, not a TreeSet.
> Pack the roots rather than the whole sets.
> - Interdex: BinInfo, no longer an Object[2].
> - Interdex: many corner cases.
> - Interdex: abstract out SortedMapSet from BTreeSet, implements subSet etc,
> cleans up code.
> - Interdex: Fix BTreeMap iterator hasNext().
> - Interdex: Extend unit tests.
> - Interdex: Scale knows its Packer.
> - Interdex: delete old code.
> - Interdex: progress tracking support in Packer. Reinstate progress tracking
> code, tweak job names for logging purposes.
> - Interdex: fix some inflate bugs.
> - Interdex: fix packer bugs related to completion.
> - Interdex: support accessing by index, cache size of tree below each node
> (_size or treeSize()), rename size() to nodeSize().
> - Interdex: Get routing key from the NodeKey for URIKey's.
> - Interdex: implement utab (uri -> uri entries). Implement translators and
> packers for utab and ttab, BIndexSerialiser should now be functional.
> - Interdex: make packer more generic/configurable, Inventory contains most
> functions of Scale and proxies actual weighing to Scale
> - Interdex: rename MAGIC to serialVersionUID.
> - Interdex: proper hashCode, equals, etc for Token*Entry.
> - Interdex: rename TokenEntry -> TermEntry, TokenURIEntry -> TermPageEntry.
> - Interdex: set dumper options so dumps are readable for debugging purposes.
> - Interdex: custom yaml format for BinInfo.
> - Interdex: new SkeletonTreeMap, uses a wrapper for values with loaded status
> and metadata, cleaner code, better iterators etc, in particular support for
> iterating over unloaded data and it throws when you try to get the values,
> partial support for subMap/headMap/tailMap, can load data while iterating
> (will throw, caller can then load on the map, then go back to iterator),
> fixes some bugs.
> - Interdex: generics.
> - Remove progress trackables in ProtoIndex, replace with a stack of
> trackables, show progress while getting the root
> - SkeletonBTreeMap: deflate/inflate on right hand side not left hand side.
> - Interdex: Lots of minor refactoring.
> - Don't depend on FredPluginAPI. It is unlikely to work any time soon, we
> will use FCP instead.
> - Eliminate URIWrapper in favour of TermPageEntry, which now includes title.
> Remove code related to uri being a string on URIWrapper, since it's a
> FreenetURI on TermPageEntry.
> - Sanity checking.
> - Interdex: refactor makeProgressIterable etc.
> - CompositeRequest interface for requests with subrequests.
> - TermEntry.equalsTarget(), rel.combine().
> - Rewrite/refactor ResultSet to handle TermEntry's better, move combination
> code out of Search. No phrase search yet, was turned off anyway. Is
> immutable, operations happen in constructor. Split DIFFERENTINDEXES mode out
> of UNION, currently handled the same, add SINGLE mode. Intersection may be
> slightly more efficient.
> - Refactor web interface for TermEntry's and CompositeRequest.
> - Delete some old code.
> - Combine relevances in Term*Entry, not in TermEntry, and make TermEntry
> immutable, create a new one in combine().
> - More refactoring, optimising merging modes in ResultSet.
> - Working phrase search.
> - Re-enable positions support in XML index code. Only use it if we need to.
> - XML index: Calculate a simple relevance score based on word count and file
> count.
> - Fix comparisons (left/right/middle, different null handling in each) in
> btreemap, order Node's for breadth-first search.
> - Breadth-first search for inflation. ****** likely to use a lot of threads...
> - AbstractRequest: record time taken.
> - Interdex: TaskAbortException thrown some places, setAbort -> abort (which
> throws it after setting it).
> - Interdex: Improve progress text.
> - Use a ThreadPoolExecutor in ParallelSerialiser. TaskHandler handles tasks
> on a thread (in serial mode), shifts them to the executor if they are already
> running, keeps progress data up to date; in parallel mode, run directly on
> executor (caller must create progress).
> - Interdex: ProgressTracker now uses a WeakHashMap, of tasks (not data).
> get*Progress just gets the progress, add*Progress throws a
> TaskInProgressException if the task is already running otherwise adds it.
> - Interdex: Parallel inflation on BTreeMap.
> - Interdex: Various refactoring.
> - Remove wanna19 from default indexes.
> - Remove FredPluginHTTP interface.
> - Refactor web interface, new class ResultNodeGenerator, delete WebUI.
> ResultNodeGenerator splits up results, will properly handle TermTermEntry's
> and TermIndexEntry's, and less code duplication than old code.
> - Split YamlArchiver into FileArchiver and YamlReaderWriter, use
> ObjectStreamReader/Writer (simpler than ObjectInput/Output) interfaces.
> - Freenet archiver. Initially write-only.
> - Web interface access to Tester, on-freenet tests.
> - Tester creates a FreenetArchiver, puts a load of test data, and pushes the
> data, showing progress; progress test tests a map of ints to ints, push test
> tests a smaller but more complex map of string to bunch of fake CHKs.
> - SimpleProgress changes: track parts done, parts known, estimated parts,
> last status string, addTotal -> addPartKnown. if estimated parts == total,
> then isTotalFinal() is true. enteredSerialiser/exitingSerialiser must be
> called.
> - Imports.
> - Option to group USKs. Sort by relevance either way.
> - Show title (tooltip) of links as the url for the direct link and the [ USK
> ].
> - Show relevance in tooltip for the main link.
> - Put search toadlet on the Browse Freenet menu.
> - Refetch on demand after a failure.
> - Make sure the search is removed after a TaskAbortException.
> - Factor out combine() from TermEntry into a method in ResultSet.
> - Remove WebUI references, reinstate FredPluginHTTP. **** merge fix hack
> - CSS fixes.
> - Detect when trying to do a phase search on an index with no position info.
> - Only show the full error trace with a big box if the error is other than an
> InvalidSearchException.
> - Automatically determine index types by partially fetching the URI.
> Initially implemented using KeyExplorer, detect via either mime type or
> existence of the base filename for the index type. This is dubious, SEE BELOW.
> - UI for adding an index.
> - POST password check works a different way, can do a search from a POST but
> not add an index.
> - Use L10n (even though it doesn't work).
> - Singleton-ise Library.
> - Split BIndexSerialiser into ProtoIndexComponentSerialiser and
> ProtoIndexSerialiser.
> - Static map of one ProtoIndexComponentSerialiser per file format. Make many
> members non-static. ProtoIndexSerialiser handles the overall ProtoIndex,
> ProtoIndexComponentSerialiser helps to serialise the individual maps.
> - Add a testing yaml index to the test for the auto-detect code.
> - Pass the insert or fetch URI in to a push task / pull task via the
> metadata, just as we pass in the filenames.
> - Remove most DataFormatException constructors to avoid misuse.
> - Static singleton ProtoIndexSerialiser in Library.
> - Pass a FreenetURI into getIndexType, not a String.
> - Wire in support for yaml indexes in Library.
> - Yaml indexes support for pulling from Freenet
> - Catch exceptions from getIndexType in parent method getIndex.
> - Server-side state based authorisation/confirmation scheme for adding
> bookmarks (this makes sense because we can add a bookmark by changing the
> name of the index).
> - Progress: ProgressParts object contains done, total, known, can presumably
> be got atomically so avoids races. Remove RequestState from AbstractRequest.
> Lots of related work. Request extends ChainedProgress, remove unneeded
> methods. Eliminate AtomicProgress. New class BaseCompositeProgress, base
> composite progress tracker for parallel tasks. ChainedProgress: series of
> tasks.
> - BaseCompositeProgress.join() can see nulls because it is fed an iterable
> over the tasks, which then looks up the progress's, which can be null before
> the progress has started. After it has finished it will however still be
> present as the iterator has a strong reference to the progress, and the
> weakhashmap is weak on the keys - the tasks. Updates to SimpleProgress, lots
> of code.
> - Various minor refactoring related to Progress, use the Progress interfaces
> where possible, track what is needed for them in e.g. FindRequest.
> - More work on progress, refactoring.
> - System.currentTimeMillis, not new Date().getTime().
> - Always get the termpositions if only for relevance.
> - Avoid NPE displaying search page, show failures better.
> - Better auto-detect of URI vs filename for indexes.
> - Index type test: test the SSK URIs, not the USKs.
> - Refactor: ProtoIndexSerialiser.forIndex() not new ProtoIndexSerialiser(),
> have a static map of serialisers by class, change
> FreenetArchiver/FileArchiver parameters, minor constants refactoring, minor
> refactoring in FileArchiver.
> - Bugfix in ParallelSerialiser.isActive(), no idea consequences, probably
> none. Bugfix in SkeletonNode.rev().
> - Handle over 100% in splitfile progress better in Interdex index fetching.
> - Another fix to FreenetArchiver progress from splitfile code (derive
> percentage from minSuccessful not total).
> - getTermEntriesHandler: combine metas and trackers into a LinkedHashMap,
> track current tracker and meta.
> - Refactor Progress estimate type. UNKNOWN or FINALIZED (= known, but only
> this enables finalizedTotal() and thus completion), or a specific value.
> Finalized total must be set before the request completes. This is often done
> automatically by SimpleProgress.addPartKnown(,true). Set total finalised in
> other places.
> - Rename rootSize -> sizeRoot, add nodeMin, entMax, heightEstimate to BTree*.
> - Progress with number of btree levels lookup estimate on
> getTermEntriesHandler.
> - Less efficient index type detection not using KeyExplorer (it was having to
> download a non-staging binary, IMHO this was bad). Based on MIME type,
> fetches [key]/, [key]/index.xml, [key]/index.yml.
> - Refactor FindRequest (XML index progress), use an enum not a string for
> stages, keep an internal Progress impl up to date for the current stage, use
> the index of the enum for the current stage to set the stage for
> ProgressParts purposes. Use the enums when setting progress externally.
> - Handle failure with a new URI - USK@ updates, not enough path components
> etc - for XML indexes.
> - Logging.
> - Improve progress bars a little, fix them for new Progress.
> - More CSS changes for progress bar.
> - Minor fixes in web interface, better error catching/handling/displaying.
> - Refactor web interface, now there is only MainPage.
> - Beginnings of Interdex distributed search implementation. Interdex,
> InterdexQuery, IndexQuery. Document algorithms a bit more in comments.
> - Make fields private or final as needed.
> - AbstractRequest: implement join().
> - Web interface bugfix adding bookmarks.
> - Style fixes, use style in POST handler.
> - Search progress: don't return as composites, stick it all together, unless
> it is different indexes being fetched. Probably looks better on search
> progress. Implement status functions.
> - Fix some NPEs, simpler progress bar.
> - Synchronization in parsing.
> - Temporary hack, put toadlet on /plugins/plugin.Library.FreesiteSearch - we
> need to let /library/ through the filter.
>
>
> platy
> infinity0
> sdiz