On Friday 15 Aug 2003 12:05, Niklas Bergh wrote: > > > Some of you may want to see the previous discussion we > > > had along these lines: > > > > http://hawk.freenetproject.org:8080/pipermail/devl/2003-June/006607.ht > > > > > ml > > > > Yes, I agree with what was said there. One thing that gets > > me, though, is that > > people keep comparing Freenet to networks such as Kazaa. If > > people just want > > a file sharing tool of that sort, why not just use Frost, or > > the replacement > > Frost front end that is being worked on to make it look more > > like Kazaa? > > > > What we are talking about here (if I am understanding this > > all correctly), is > > a Google type search engine for Freesite content. The two > > concepts are quite > > different. > > They should be the same. It does not really matter wheter or not the > search produces a link to [EMAIL PROTECTED]//index.html or to > [EMAIL PROTECTED]//linuximage.iso
The point is that linuximage.iso is not indexable easily, because it is a binary file, while linux-howto.html is indexable easily because it is an HTML file. The concept of crawler robots also requires HTML style links to find more content to index. The two concepts are actually not quite as similar as you may think. They have very different priorities. Saying "find me documents about x, y, z" means go find files with this content in them, and order them in some sensible way. Saying "find me files whose names are something like x y z" is quite different. The indices would be very different and the indexing mechanisms would be different. While you could use a Google style search engine for files, the fundamental different is that you are indexing on CONTENT rather than names or meta-data. > > > I think the idea that was most liked was that the user > > > downloads a few > > > index files from freesites he chooses and then uses them in > > > some local search engine. > > > > > > Indexes could be built by hand, crawler, or people > > > might somehow recomend thier site for an index. > > > > Interesting idea. So, a site author would insert an > > additional file, called, > > say, //index.txt which would contain a compact index of all > > their pages? That > > would certainly make the crawling process faster, as only one > > file per > > Freesite would need to be retrieved. > > I wouldn't want to link the concept of seach indexes directly to > freesites. No of course not. But it would be a method by which Freesite owners could help ensure that their site is indexed properly. Unfortunately, as everything else, this could be abused because there is no way to ensure that index corresponds to the site. The only reliable way to index content is to crawl it. Also note that Freesite owners would probably prefer a full crawl of their site to take place, because it would help propagate their content within the network, thus there is no real incentive for them to create an index file (they get more benefit from there not being one). > It sure might be good if every freesite author published an > index at [EMAIL PROTECTED]//index.db but it should definitely not be reqired of > them. Well, there are many ways indexing could work. There could be search engines that concentrate on indexing sites that have the mentioned index.db file, and ignore all sites that don't have it. There could be other indices that ignore the index.db file and go and index things for themselves. Both of these are really a matter of user-level convention. > Any given index should be able to produce 'links' to any > SSK|KSK|CHK|ARK inside freenet. Yes, but you have to somehow point to that key. In the Freesite context, you would look for it by following html links. There could be other conventions made for file indices, e.g. what Frost does for binary files, but that is really up to the implementation and specific intended purpose of each index. It is important to use the correct tool for the job. If we are trying to come up with a Google type search engine, then let's focus on indexing html and text pages. Leave file sharing to the tools that are designed for it. > This enables people to act as 'index > publishers' and each and every user could choose whose indexes to > 'search in'/'merge into their own local index' and trust, much like > todays index pages.... Sort of. The more segmented/limited the indices are, the less useful they are. The index that knows about more content is going to be the index that more people use. This pretty much sinks the concept of "I'll only index these sites", as you either index a very small amount of content, or you have the same problem of manual indexing/linking, i.e. requirement of a lot of user intervention. Gordan _______________________________________________ devl mailing list [EMAIL PROTECTED] http://hawk.freenetproject.org:8080/cgi-bin/mailman/listinfo/devl
