Re: [freenet-dev] freenet (pre-)searchengine

Gordan Fri, 15 Aug 2003 04:41:55 -0700

On Friday 15 Aug 2003 12:05, Niklas Bergh wrote:

> > > Some of you may want to see the previous discussion we
> > > had along these lines:
> >
> > http://hawk.freenetproject.org:8080/pipermail/devl/2003-June/006607.ht
> >
> > > ml
> >
> > Yes, I agree with what was said there. One thing that gets
> > me, though, is that
> > people keep comparing Freenet to networks such as Kazaa. If
> > people just want
> > a file sharing tool of that sort, why not just use Frost, or
> > the replacement
> > Frost front end that is being worked on to make it look more
> > like Kazaa?
> >
> > What we are talking about here (if I am understanding this
> > all correctly), is
> > a Google type search engine for Freesite content. The two
> > concepts are quite
> > different.
>
> They should be the same. It does not really matter wheter or not the
> search produces a link to [EMAIL PROTECTED]//index.html or to
> [EMAIL PROTECTED]//linuximage.iso


The point is that linuximage.iso is not indexable easily, because it is a 
binary file, while linux-howto.html is indexable easily because it is an HTML 
file. The concept of crawler robots also requires HTML style links to find 
more content to index.

The two concepts are actually not quite as similar as you may think. They have 
very different priorities. Saying "find me documents about x, y, z" means go 
find files with this content in them, and order them in some sensible way.

Saying "find me files whose names are something like x y z" is quite 
different. The indices would be very different and the indexing mechanisms 
would be different. While you could use a Google style search engine for 
files, the fundamental different is that you are indexing on CONTENT rather 
than names or meta-data.

> > > I think the idea that was most liked was that the user
> > > downloads a few
> > > index files from freesites he chooses and then uses them in
> > > some local search engine.
> > >
> > > Indexes could be built by hand, crawler, or people
> > > might somehow recomend thier site for an index.
> >
> > Interesting idea. So, a site author would insert an
> > additional file, called,
> > say, //index.txt which would contain a compact index of all
> > their pages? That
> > would certainly make the crawling process faster, as only one
> > file per
> > Freesite would need to be retrieved.
>
> I wouldn't want to link the concept of seach indexes directly to
> freesites.

No of course not. But it would be a method by which Freesite owners could help 
ensure that their site is indexed properly. Unfortunately, as everything 
else, this could be abused because there is no way to ensure that index 
corresponds to the site. The only reliable way to index content is to crawl 
it.

Also note that Freesite owners would probably prefer a full crawl of their 
site to take place, because it would help propagate their content within the 
network, thus there is no real incentive for them to create an index file 
(they get more benefit from there not being one).

> It sure might be good if every freesite author published an
> index at [EMAIL PROTECTED]//index.db but it should definitely not be reqired of
> them.

Well, there are many ways indexing could work. There could be search engines 
that concentrate on indexing sites that have the mentioned index.db file, and 
ignore all sites that don't have it. There could be other indices that ignore 
the index.db file and go and index things for themselves.

Both of these are really a matter of user-level convention.

> Any given index should be able to produce 'links' to any
> SSK|KSK|CHK|ARK inside freenet.

Yes, but you have to somehow point to that key. In the Freesite context, you 
would look for it by following html links. There could be other conventions 
made for file indices, e.g. what Frost does for binary files, but that is 
really up to the implementation and specific intended purpose of each index.

It is important to use the correct tool for the job. If we are trying to come 
up with a Google type search engine, then let's focus on indexing html and 
text pages. Leave file sharing to the tools that are designed for it.

> This enables people to act as 'index
> publishers' and each and every user could choose whose indexes to
> 'search in'/'merge into their own local index' and trust, much like
> todays index pages....

Sort of. The more segmented/limited the indices are, the less useful they are. 
The index that knows about more content is going to be the index that more 
people use. This pretty much sinks the concept of "I'll only index these 
sites", as you either index a very small amount of content, or you have the 
same problem of manual indexing/linking, i.e. requirement of a lot of user 
intervention.

Gordan
_______________________________________________
devl mailing list
[EMAIL PROTECTED]
http://hawk.freenetproject.org:8080/cgi-bin/mailman/listinfo/devl

Re: [freenet-dev] freenet (pre-)searchengine

Reply via email to