I worked on a website that had the same issue. We made a "search engine"
page that listed all the documents that we wanted to index as links to
documents that contained summaries of those documents with links to the
entire document on the limited access site - Google won't be able to
follow these links because they require use sign on, but the link will
be there for enticement when Googlers find the page.

Love the idea about retrieving the stop words and stemming them for
google indexing. But, Google is pretty picky I am told, so I would not
be surprised if they detected this sort of scheme and decided not to
index your pages. 

Good luck. 
 
KLCobb

-----Original Message-----
From: Chakra Yadavalli [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, March 16, 2005 11:44 PM
To: java-user@lucene.apache.org
Subject: How do you make "protected content" searchable by Google?

Hello, I am not sure if this is the right question for this list but
it is in regards to search engines.

Suppose you have a website that hosts some protected content that is
accessible via registered users. How you make the content searchable
by Google and other popular websearch engines? The idea is not to
reveal the conent even via the "Google cache."

Here is what I am thinking... 
Using Lucene (or its derivatives), skim thru the "protected content"
and remove all the common stop words , stem the words and place the
resulting text files in a directory availabe for the search bots (via
robotstxt rules). That way, even if the content is cached by the
search engines, it does not make much sense to humans but it still
will enable them to search it. When they click on the link to the
skimmed files, we need to redirect them to the login/registe page and
upon successful login, they should  be redirected to the actual human
readable/understandable page that corresponds to that has the "skimmed
content." Note that the "protected content" may be living in a Content
Management System or a database.

Am I overthinking/engineering it? Any ideas are really appreciated.

Thanks in advance,
Chakra
-- 
Visit my weblog: http://www.jroller.com/page/cyblogue

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to