[htdig-dev] A flexible security solution for searching the desktop

Wolfgang Müller Fri, 04 Jan 2002 11:57:11 -0800

Hi,

I am maintainer of the GNU Image Finding Tool and active in the 
Fer-de-Lance project that's been in (not very loud, but 
behind-the-scenes-active) exsistence since April last year. Within this 
project we work towards integration of searching services into the desktop.


I am mailing our list and a couple of other developer lists, because I think 
I have found an architecture that provides security while maintaining most of 
the advantages of demon-based search engine architectures. I think this 
architecture and associated tricks are flexible enough to encompass different 
search engines, so this mail is not about Medusa vs. htdig vs. GIFT, but 
rather how to work together to solve our common security problems for desktop 
integration of our engines. 

And, of course, I would be happy to get some suggestions for improvement 
and/or some developer time. I would be less happy if someone finds a 
fundamental flaw, but also this would be better than wasting my time trying 
to develop this stuff further. 

Now let's go into more detail.

GOAL:
The goal is to provide search services to the desktop user. These search 
services should not only encompass web-visible URLs, but rather all files 
accessible to the user as well as http/ftp/etc. accessible items.

ISSUE:
The first issue is -privacy-: the system should not tell us locations of 
files that we cannot read otherwise. For example: looking for some 
correspondence with the health insurance, we do not want to know that our 
colleague wrote last month three letters that match our search.

Second -memory consumption-: All indexes for similarity searching use memory 
which is either proportional to the size of each indexed file, or quite big 
to begin with. We do not want plenty of users that roll their own index, we 
want one index, otherwise we are likely to spend a multiple of our useful 
disk space on indexes.

SUGGESTION: Use a daemon and make sure that authentication is good. :-)

Too easy? Of course the problem lies in providing the authentification. 

What I suggest is to run a daemon which creates for each user U an unix 
domain socket which is readable *and* writable *only* by this one user U (and 
root, of course). All instructions to the indexing demon like e.g.

add item to index
delete item from index
move item within index (new URL for same file)
block item/subdirectory/pattern (don't index *.o files for example)
process query

would go through the socket. By knowing which socket received the request, we 
automatically know the user, and then we just have to compare for each result 
item, if it can be read by the user who issued the query. Of course we give 
back only the readable items.

We can create the sockets as user "nemo", and then chown them using a very 
small script running under root. So we would be root during a couple of 
seconds on startup, afterwards everything would happen as a user (nemo) who 
has write rights on one directory tree which is unreadable for all else. So 
there is not the issue of a big indexing program running under root for days 
and days in a row.

Adding an item is a (small) issue. We probably have to pipe the uuencoded (or 
something equivalent) binary through the socket in order to have it indexed 
on the other side of the socket. However, I guess the efficiency overhead is 
small compared to the indexing cost.

Things become a trifle more complex for adding items which are found on the 
web. Somebody indexing a web page should probably indicate who else (group, 
all) is allowed to know that somebody's indexed that page. If several users 
publish an URL the least restrictive rights are taken into account.

WHATS THERE? WHAT'S NEEDED?

Basically, I have tried out the socket stuff with a small test program. 
Works. Now I am starting to integrate that with the GIFT (which involves 
cleaning up some of my internet socket code).

What's still needed is the filter that stores which URLs are indexed under 
which owner, and with which rights. On each query GIFT can ask this filter, 
if a list of URLs can be given out as query result. Currently, I would like 
to base this filter on MySQL.

When that filter is in place, writing a medusa-plugin for the GIFT would be 
easy. I just finished a primitive htdig GIFT plugin which soon goes to CVS, 
so that one just needs some fleshing out.

CONCLUSION

I hope to have convinced you that we can get relatively easily a secure, yet 
memory efficient indexing solution for the desktop. If this is been already 
done, please tell me where. If my mail is a stupid suggestion, please tell me 
that, too. However, if you would like to participate in the coding and design 
effort or simply to share your opinion, please do not hesitate to subscribe 
to the fer-de-lance-development list.

Cheers,
Wolfgang

_______________________________________________
htdig-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/htdig-dev

[htdig-dev] A flexible security solution for searching the desktop

Reply via email to