[freenet-dev] freenet (pre-)searchengine

Newsbite Thu, 14 Aug 2003 17:06:31 -0700

On Thursday 14 August 2003 22:44, Newsbite wrote:
> Anyway, what I was thinking was, that there are _javascript_s (and probably
> other stuff as well :-) that can emulate a searchengine. The database is
> stored as part of the _javascript_ on a webpage, thus, and is readily (and
> very fast) at showing the index of the links (of the word(s) that were
> requested).

Actually, you would want to keep the data and the code completely separate.
The index would be enormous. You would also have to implement some very
unorthodox indexing methods to create an index that would scale to any extent
with an underlying storage medium such as Freenet.

For most standard indexing mechanisms used today, when they are applied to
Freenet, they fall apart because of the nature of access. When you try to
make things future proof and scalable up to, say 3bn pages, it all becomes
infeasible.

Well, agreed: this is another drawback I was aware off (forgot to mention it, though). the system would only work good upto medium sized amounts of data; hundreds and thousands of links are evisable, but millions and billions would not be, in all likelyhood. It's worth noting, that it's exactly in this range that the current status of Freenet is in, and since it's an intermediate solution untill a real searchengine is created, it would do. (Thereafter, it could be reduced as a fast-way to get (only) other meta-indexes, for instance)

I am not sure I understand what you mean by 'unorthodox indexing methods'; while not extremely efficient, as yet, the normal crawling system that is used today would suffice, me thinks. In effect, the underlying system would not differ that much from the TFE and the like, only the way in which it is presented (and requested) would be different. where the TFE is like one giant page full of links, with my concept it would be far more google-like (at least, in appearence). I mean: just a little window or field to type your searchwords in, klick 'search', and get a bunch of links (browser retrieved from the java-script itself) which contain the key-words.

> This is not ideal, ofcourse, but it would be an improvement to the current
> system.

It would - if it could be done efficiently.

I have already the code of the _javascript_ itself. It could be done efficiently (with the restriction of handling vast amounts of data)

> Once again, I've told my idea on IIRC, and it was met rather positively,
> but with the remarks (which had occured to me also ;-):
>
> It still needs someone to insert/retrieve the database.

That is not a big problem. Any fairly standard web crawler would work for
indexing the pages. Uploading the database is also not an issue. The problem
is in the database storage format. It is difficult to come up with a method
that would yield good results and acceptable response times with a
high-latency network.

I think the last part is not correct. I'm talking about a java-script enabled on the clientside (browser). The high-latency would not be an issue, thus, once the 'google-like' page (with the _javascript_/database in it) has been retrieved succesfully.

> Wich is true, but that could be said of the current TFE system too.
> Besides, it can't be that difficult to largely automate the process.

Automating the process would be dead easy. Coming up with a storage format
that is efficient is difficult. Another difficulty lies in implementing an
index format which is compact yet useful. There is no point in creating an
index that would take up as much space as all the data it is trying to index.
That would be bad, as the index would effectively double the required storage
capacity of the network.

agreed. The data would not have to be duplicated, however. Only keywords (or those short descriptions that you already today can insert) and the (active?) links themselves are needed; the content itself is not really necessary.

> The advantages are legio:
>
> 1)a real search-like mechanism

Sort of. There would be some additional limitations.

when are there no limitations? ;-)

> 2)more user friendly

Maybe. You could do it all in _javascript_, as you said. This would, however,
put most people off because of the filter warnings. A better way to do it
would be to create a Fred plug-in applet that would perform this function. It
would probably be faster, and it would work around the problem of filter
warnings. It would also be "easier" to trust it if it were distributed with
the node library, rather than just a random page from an inherently
untrustworthy medium.

Indeed, filterwarnings put people off, that's why I made that suggestion at the end. As for your plug-in idea: it may have some value, but alas, I'm an (IT) manager and free-lance writer, not a developer. My coding experience is very limited; some html, php and _javascript_, and that's all. So I'm afraid somebody else would have to do your suggestion. :-)

> 3)no more scrolling and manually searching for stuff (TFE is beginning to
> become TOO large to easely navigate through)

True to some extent. IIRC, YoYo handles this reasonably sensibly, but in the
long term, any manualy created index will become implausible. We haven't
reached that amount of content in Freenet yet.

Well, it won't improve with time, that's for sure.

> 4)the moral issue is greately
> reduced; because (links to) 'illegal' things such as copyrighted material
> (or worse) would only be visible when you actively seek/request it

That is not necessarily strictly true. It depends on how much of specific type
of content there is. Any automated search engine has such issues. For
example, how many times have you entered a completely normal, mundane and
geeky search string into Google/Altavista/Other search engine and found that
totally unrelated porn pages crop up even on the first results page, because
some porn site web master put the terms on his page so that it would come up
for pretty much ANY query you typed in?

True, but it rates the links according to the relevance of the keywords that were put in the searchbox. It's a rather simple system, easely by-passed, but more complex rating-mechanisms could be used (as google does).It will never be fully bulletproof, ofcourse, but nothing will, I think.

But anyway, the apparent in-your-face visiblity of links to illegal material would be gone.

> It would require, however, that at least for this particular script (or for
> some particular page), the java-script filter would have to let it pass
> without much fuzz.

Not really. Just leave it to the user to decide whether they trust the page.
If they do, they can click the "proceed anyway" button. The correct way
around this would have to be the plug-in applet.

Ah, thanks for the hint. I had the impression the filter actually blocked the _javascript_, but if I understand you correctly, it can be passed, just by klicking on it? are you sure it does not hamper java-script? I've recently inserted a testsite with _javascript_ (not the search-kind, though), and it seemed not to work at all.

about the 'correct way'..maybe you are 'correct' :-), but as I said, somebody else will have to do that. (and, as I have noticed before, if no1 actually sets it's shoulders under it, it seldom happens, here. take the sponsoring/funding bit, for instance ;-)

> Not ideal, perhaps, but untill a true good-working, scalable, anonymous
> searchengine is created to work in freenet, it would beat everything that
> is currently available on freenet.

There are many, many more technical difficulties involved in that than you may
realize, especially in coming up with a good, scalable index format.

On itself, it's rather simple, really. It is however, not unlimited scalable, that is true. but I really think, in the short to mid-long term, it would be a hit.

[freenet-dev] freenet (pre-)searchengine

Reply via email to