[freenet-dev] SoC status on search features

Matthew Toseland Wed, 8 Jul 2009 15:33:55 +0100

On Wednesday 08 July 2009 01:35:59 Mike Bush wrote:
> A progress report on my project.
> 
> My SoC project originally had to do with improvements to XMLLibrarian
> and XMLSpider to provide a better search experience to freenet users,
> particularly helping with the issue of newcomers to freenet starting
> up a search and seeing nothing happening for ages. Some of my original
> targets have changed though as infinity0's work on new index formats
> has meant I have not been having to change the index format to include
> more metadata.
> 
> Some of my original targets and status:
> 
> Asynchronous searching - the current official version of XMLLibrarian
> from the work I did in easter runs searches separately from showing
> progress so that the page is not blank. In my forked version searches
> for different terms and on different indexes are run at the same time
> to speed this progress further.
> 
> Search progress - the current version shows the raw description of the
> fetch progress from the ClientGetter's fetching the indexes. My newer
> version uses ajax to update the progress and get the results, avoiding
> screen refreshes if you have javascript enabled in your browser.
> Progress bars are also shown for each fetch. I did have partial
> results being shown as some of the fetches complete but it had
> implications on performance and I had doubts on whether it would be of
> use to anyone.
> 
> Result listing - commits last week were working on better displaying
> of search results, including grouping SSK sites and hiding older
> versions of them, showing uri's and USK links (I think people were
> asking for that).
> 
> Search querys - the most recent work I have been doing, not & or
> operations are working well (as far as my tests have gone), I have
> been implementing phrase searching but it is not working for a reason
> I am yet to determine.
> 
> This work is availiable in my fork
> git://github.com/platy/plugin-XMLLibrarian-staging.git  it is ready to
> be merged back into the freenet staging repository, hopefully into the
> official as well after review.
> 
> Other items in my proposal that need to be done are:
> Recording meta data in the spider, this will allow more information on
> the search page and (some)search relevance ranking.
> Use of filters other than the html one to allow other filetypes to be indexed.
> Embedded search in freesites - allowing someone who has uploaded an
> index to present a box on their site to start searches on it.
> 
> And importantly, I will be working with infinity0, to integrate the
> Interdex distributed index system into the search interface and
> crawling.
> 
> MikeB


Woah, what a lot of code. Lots of great stuff, thanks. However, I have not 
deployed it yet as IMHO it has significant regressions relative to the current 
deployed XMLLibrarian.

Some testing:

Not very stable. Most if not all one-word searches give "No search for this, 
something went wrong", before giving the results.

Multi-word searches don't work:

stupid idiot -> nothing happens, in log:
Jul 08, 2009 14:20:14:921 (plugins.XMLLibrarian.Search, HTTP socket handler at 
2068039300(698), ERROR): No split made, stupid idiot

Traditionally this has been treated as an OR, this should be fixed before we 
deploy the new plugin version.

If I do it as a phrase search (with quotes), it works and gives no results. 
Perhaps that phrase doesn't occur? :)

Searching for "freenet developers" as a phrase search gives 0 results, which 
seems very unlikely.

Searching two indexes at once - what is wrong with wanna 20? Can't we search 
both by default?

I am not sure what the syntax is for OR? The obvious doesn't work and the 
regexes are strange... Same with others.

"freenet"||"developer" gives an NPE:
java.lang.NullPointerException
-- plugins.XMLLibrarian.Search.isFinished(Search.java:292)
-- plugins.XMLLibrarian.Search.isFinished(Search.java:292)
-- plugins.XMLLibrarian.interfaces.WebUI.searchpage(WebUI.java:106)
-- plugins.XMLLibrarian.interfaces.WebUI.handleHTTPGet(WebUI.java:74)
-- plugins.XMLLibrarian.XMLLibrarian.handleHTTPGet(XMLLibrarian.java:42)
-- freenet.pluginmanager.PluginManager.handleHTTPGet(PluginManager.java:630)
-- freenet.clients.http.PproxyToadlet.handleGet(PproxyToadlet.java:362)
-- freenet.clients.http.ToadletContextImpl.handle(ToadletContextImpl.java:369)
-- 
freenet.clients.http.SimpleToadletServer$SocketHandler.run(SimpleToadletServer.java:688)
-- freenet.support.PooledExecutor$MyThread.run(PooledExecutor.java:224)

Some comments on the code:

                                                case INTERSECTION:
+                               for(Request<URIWrapper> r : subsearches)
+                                       if(result.size()>0)
                                                        
result.retainAll(r.getResult());
+                                       else
+                                               result.addAll(r.getResult());
+                               break;

Is this right? I guess it deals with stopwords well ...

Using HTMLNode's means we should generally not have any security problems, 
provided the URIs returned are all FreenetURI's, is this correct? Are they 
verified or are they just displayed as strings?

You should only convert an SSK into a USK if it is actually a USK in the form 
of an SSK i.e. if it has a version. Some are not.

There can be many sites with different names under the same SSK, each with 
different versions.

+                               if(realurl.contains("SSK@"))
+                                               urlnode.addChild("a", new 
String[]{"href", "class"}, new String[]{realusk
url, "librarian-result-uskbutton"}, "[ USK ]");

Dunno if you still do this, if you do, it should be startsWith(), no?

Is setting toLowerCase locale to US a good idea?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 835 bytes
Desc: This is a digitally signed message part.
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20090708/86225458/attachment.pgp>

[freenet-dev] SoC status on search features

Reply via email to