|
On Thursday 14 August 2003 22:44, Newsbite
wrote:
> Anyway, what I was thinking was, that there are _javascript_s (and probably > other stuff as well :-) that can emulate a searchengine. The database is > stored as part of the _javascript_ on a webpage, thus, and is readily (and > very fast) at showing the index of the links (of the word(s) that were > requested). Actually, you would want to keep the data and the code completely separate. The index would be enormous. You would also have to implement some very unorthodox indexing methods to create an index that would scale to any extent with an underlying storage medium such as Freenet. For most standard indexing mechanisms used today, when they are applied to Freenet, they fall apart because of the nature of access. When you try to make things future proof and scalable up to, say 3bn pages, it all becomes infeasible. Well, agreed: this is another drawback I was
aware off (forgot to mention it, though). the system would only work good upto
medium sized amounts of data; hundreds and thousands of links are evisable, but
millions and billions would not be, in all likelyhood. It's worth noting, that
it's exactly in this range that the current status of Freenet is in, and since
it's an intermediate solution untill a real searchengine is created, it would
do. (Thereafter, it could be reduced as a fast-way to get (only) other
meta-indexes, for instance)
I am not sure I understand what you mean by
'unorthodox indexing methods'; while not extremely
efficient, as yet, the normal crawling system that is used today would suffice,
me thinks. In effect, the underlying system would not differ that much from the
TFE and the like, only the way in which it is presented (and requested) would be
different. where the TFE is like one giant page full of links, with my concept
it would be far more google-like (at least, in appearence). I mean: just a
little window or field to type your searchwords in, klick 'search', and get a
bunch of links (browser retrieved from the java-script itself) which contain the
key-words.
> This is not ideal, ofcourse, but it would be an improvement to the current > system. It would - if it could be done efficiently. I have already the code of the _javascript_ itself. It could be
done efficiently (with the restriction of handling vast amounts of
data)
> Once again, I've told my idea on IIRC, and it was met rather positively, > but with the remarks (which had occured to me also ;-): > > It still needs someone to insert/retrieve the database. That is not a big problem. Any fairly standard web crawler would work for indexing the pages. Uploading the database is also not an issue. The problem is in the database storage format. It is difficult to come up with a method that would yield good results and acceptable response times with a high-latency network. I think the last part is not correct. I'm talking about a
java-script enabled on the clientside (browser). The high-latency would not be
an issue, thus, once the 'google-like' page (with the _javascript_/database in it)
has been retrieved succesfully.
> Wich is true, but that could be said of the current TFE system too. > Besides, it can't be that difficult to largely automate the process. Automating the process would be dead easy. Coming up with a storage format that is efficient is difficult. Another difficulty lies in implementing an index format which is compact yet useful. There is no point in creating an index that would take up as much space as all the data it is trying to index. That would be bad, as the index would effectively double the required storage capacity of the network. agreed. The data would not have to be duplicated, however.
Only keywords (or those short descriptions that you already today can
insert) and the (active?) links themselves are needed; the content
itself is not really necessary.
> The advantages are legio: > > 1)a real search-like mechanism Sort of. There would be some additional limitations. when are there no limitations? ;-)
> 2)more user friendly Maybe. You could do it all in _javascript_, as you said. This would, however, put most people off because of the filter warnings. A better way to do it would be to create a Fred plug-in applet that would perform this function. It would probably be faster, and it would work around the problem of filter warnings. It would also be "easier" to trust it if it were distributed with the node library, rather than just a random page from an inherently untrustworthy medium. Indeed, filterwarnings put people off, that's why I made that
suggestion at the end. As for your plug-in idea: it may have some value, but
alas, I'm an (IT) manager and free-lance writer, not a developer. My coding
experience is very limited; some html, php and _javascript_, and that's all. So
I'm afraid somebody else would have to do your suggestion.
:-)
> 3)no more scrolling and manually searching for stuff (TFE is beginning to > become TOO large to easely navigate through) True to some extent. IIRC, YoYo handles this reasonably sensibly, but in the long term, any manualy created index will become implausible. We haven't reached that amount of content in Freenet yet. Well, it won't improve with time, that's for
sure.
> 4)the moral issue is greately > reduced; because (links to) 'illegal' things such as copyrighted material > (or worse) would only be visible when you actively seek/request it That is not necessarily strictly true. It depends on how much of specific type of content there is. Any automated search engine has such issues. For example, how many times have you entered a completely normal, mundane and geeky search string into Google/Altavista/Other search engine and found that totally unrelated porn pages crop up even on the first results page, because some porn site web master put the terms on his page so that it would come up for pretty much ANY query you typed in? True, but it rates the links according to the relevance
of the keywords that were put in the searchbox. It's a rather simple system,
easely by-passed, but more complex rating-mechanisms could be used (as google
does).It will never be fully bulletproof, ofcourse, but nothing will, I
think.
But anyway, the apparent in-your-face visiblity of links to
illegal material would be gone.
> It would require, however, that at least for this particular script (or for > some particular page), the java-script filter would have to let it pass > without much fuzz. Not really. Just leave it to the user to decide whether they trust the page. If they do, they can click the "proceed anyway" button. The correct way around this would have to be the plug-in applet. Ah, thanks for the hint. I had the impression the filter
actually blocked the _javascript_, but if I understand you correctly, it can be
passed, just by klicking on it? are you sure it does not hamper java-script?
I've recently inserted a testsite with _javascript_ (not the search-kind, though),
and it seemed not to work at all.
about the 'correct way'..maybe you are 'correct' :-), but as I
said, somebody else will have to do that. (and, as I have noticed before, if no1
actually sets it's shoulders under it, it seldom happens, here. take the
sponsoring/funding bit, for instance ;-)
> Not ideal, perhaps, but untill a true good-working, scalable, anonymous > searchengine is created to work in freenet, it would beat everything that > is currently available on freenet. There are many, many more technical difficulties involved in that than you may realize, especially in coming up with a good, scalable index format. On itself, it's rather simple, really. It is however, not
unlimited scalable, that is true. but I really think, in the short to mid-long
term, it would be a hit.
|
- [freenet-dev] freenet (pre-)searchengine Newsbite
- Re: [freenet-dev] freenet (pre-)searchengine Toad
- Re: [freenet-dev] freenet (pre-)searchengine Gordan
- Re: [freenet-dev] freenet (pre-)searchengine colin
- Re: [freenet-dev] freenet (pre-)searchengine Newsbite
- Re: [freenet-dev] freenet (pre-)searchengine Nick Tarleton
- Re: [freenet-dev] freenet (pre-)searchengine Gordan
- Re: [freenet-dev] freenet (pre-)searchengin... Some Guy
- Re: [freenet-dev] freenet (pre-)search... Gordan
- RE: [freenet-dev] freenet (pre-)se... Niklas Bergh
- Re: [freenet-dev] freenet (pre... Gordan
- RE: [freenet-dev] freenet (pre... Niklas Bergh
- Re: [freenet-dev] freenet (pre... Gordan
- Re: [freenet-dev] freenet (pre-)searchengine Gordan
- Re: [freenet-dev] freenet (pre-)searchengine Tom Kaitchuck
