RE: [freenet-dev] freenet (pre-)searchengine

Niklas Bergh Fri, 15 Aug 2003 06:15:07 -0700


> -----Original Message-----
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Gordan
> Sent: den 15 augusti 2003 12:39
> To: [EMAIL PROTECTED]
> Subject: Re: [freenet-dev] freenet (pre-)searchengine
> 
> 
> On Friday 15 Aug 2003 12:05, Niklas Bergh wrote:
> 
> > > > Some of you may want to see the previous discussion we 
> had along 
> > > > these lines:
> > >
> > > 
> http://hawk.freenetproject.org:8080/pipermail/devl/2003-June/006607.
> > > ht
> > >
> > > > ml
> > >
> > > Yes, I agree with what was said there. One thing that gets me, 
> > > though, is that people keep comparing Freenet to networks such as 
> > > Kazaa. If people just want
> > > a file sharing tool of that sort, why not just use Frost, or
> > > the replacement
> > > Frost front end that is being worked on to make it look more
> > > like Kazaa?
> > >
> > > What we are talking about here (if I am understanding this all 
> > > correctly), is a Google type search engine for Freesite 
> content. The 
> > > two concepts are quite
> > > different.
> >
> > They should be the same. It does not really matter wheter 
> or not the 
> > search produces a link to [EMAIL PROTECTED]//index.html or to 
> > [EMAIL PROTECTED]//linuximage.iso
> 
> The point is that linuximage.iso is not indexable easily, 
> because it is a 
> binary file, while linux-howto.html is indexable easily 
> because it is an HTML 
> file.


linuximage.iso is much more easily indexable than linux-howto.html since
the only information you would index is its name and maybe size and some
other easily accessible properties (unless the filter is very
sophisticated) while people would probably expect an html indexer to
have a look at the content which is much more work to code.

>The concept of crawler robots also requires HTML style 
> links to find 
> more content to index.

Yes, a crawler would definitely consider linuximage.iso as a leaf item.

> 
> The two concepts are actually not quite as similar as you may 
> think. They have 
> very different priorities. Saying "find me documents about x, 
> y, z" means go 

I assume you mean "find me documents which has the strings x, y, z
inside"?

> find files with this content in them, and order them in some 
> sensible way.
>
> Saying "find me files whose names are something like x y z" is quite 
> different. The indices would be very different and the 
> indexing mechanisms 
> would be different. While you could use a Google style search 
> engine for 
> files, the fundamental different is that you are indexing on 
> CONTENT rather 
> than names or meta-data.

Every search engine I know about searches for strings associated with a
certain dataitem. Sometimes the strings are found in the meta-data,
othertimes inside the actual file. Both is useful, it is just a matter
of combining the hits in a sensible way or allowing the user to specify
what the query engine should do for him.

> > > > I think the idea that was most liked was that the user 
> downloads a 
> > > > few index files from freesites he chooses and then uses them in
> > > > some local search engine.
> > > >
> > > > Indexes could be built by hand, crawler, or people
> > > > might somehow recomend thier site for an index.
> > >
> > > Interesting idea. So, a site author would insert an 
> additional file, 
> > > called, say, //index.txt which would contain a compact 
> index of all
> > > their pages? That
> > > would certainly make the crawling process faster, as only one
> > > file per
> > > Freesite would need to be retrieved.
> >
> > I wouldn't want to link the concept of seach indexes directly to 
> > freesites.
> 
> No of course not. But it would be a method by which Freesite 
> owners could help 
> ensure that their site is indexed properly. Unfortunately, as 
> everything 
> else, this could be abused because there is no way to ensure 
> that index 
> corresponds to the site. The only reliable way to index 
> content is to crawl 
> it.
> 
> Also note that Freesite owners would probably prefer a full 
> crawl of their 
> site to take place, because it would help propagate their 
> content within the 
> network, thus there is no real incentive for them to create 
> an index file 
> (they get more benefit from there not being one).

If I don't want to run a crawler I would very much prefer the site to
provide a pre-generated and ready-to-use index. There beeing an
pre-generated index of the site might even cause me to propagate pages
through plain old fproxy request because I can now find them more
easily.

> 
> > It sure might be good if every freesite author published an 
> index at 
> > [EMAIL PROTECTED]//index.db but it should definitely not be reqired of them.
> 
> Well, there are many ways indexing could work. There could be 
> search engines 
> that concentrate on indexing sites that have the mentioned 
> index.db file, and 
> ignore all sites that don't have it. There could be other 
> indices that ignore 
> the index.db file and go and index things for themselves.
> 
> Both of these are really a matter of user-level convention.

Definitely

> > Any given index should be able to produce 'links' to any
> > SSK|KSK|CHK|ARK inside freenet.
> 
> Yes, but you have to somehow point to that key. In the 
> Freesite context, you 
> would look for it by following html links. There could be 
> other conventions 
> made for file indices, e.g. what Frost does for binary files, 
> but that is 
> really up to the implementation and specific intended purpose 
> of each index.

As you say, it is a matter of tools. Frost understands its own kind of
linking, fproxy its own, FMB understands a third, flinks a fourth and so
on.

> It is important to use the correct tool for the job. If we 
> are trying to come 
> up with a Google type search engine, then let's focus on 
> indexing html and 
> text pages. Leave file sharing to the tools that are designed for it.

I don't agree. I would like the see a technology that could be used more
widely. I would not like to see a google type seach engine. I would like
to see a standardized something that one could build a google type
search engine *upon*.

> > This enables people to act as 'index
> > publishers' and each and every user could choose whose indexes to 
> > 'search in'/'merge into their own local index' and trust, much like 
> > todays index pages....
> 
> Sort of. The more segmented/limited the indices are, the less 
> useful they are. 
> The index that knows about more content is going to be the 
> index that more 
> people use. This pretty much sinks the concept of "I'll only 
> index these 
> sites",

TFE has more links than TFEE. Still I mostly use TFEE because I am not
interested in some of the stuff that the TFE page provides.

>as you either index a very small amount of content, 
> or you have the 
> same problem of manual indexing/linking, i.e. requirement of 
> a lot of user 
> intervention.

See comment about linking above.

/N

_______________________________________________
devl mailing list
[EMAIL PROTECTED]
http://hawk.freenetproject.org:8080/cgi-bin/mailman/listinfo/devl

RE: [freenet-dev] freenet (pre-)searchengine

Reply via email to