Re: Exhibits invisible to Google

David R. Karger Thu, 25 Jan 2007 11:41:39 -0800


Michael K. Bergman wrote:
> Hi David and David,
>
> Forgive me if I'm not seeing the forest for the trees (it's happened 
> before!), but I have this general impression of the role of Exhibit and 
> search engines (Google):
>
> content --> characterization (structure) --> Exhibit
>       |
>       |
>      \ /
> full-text indexing/searching
> In this shorthand, Exhibit plays the role of organized structure display 
> (publication) and manipulation (sorting + filtering by structure).
>
> By referencing characterized content in Exhibit, that content (such as 
> the professor's publications, assuming they have their own static link) 
> is still discoverable/searchable, but NOT in reference to the Exhibit 
> home page.  But isn't that OK?  The purpose of Exhibit is to provide 
> more meaningful characterization and structure to the underlying dataset 
> of all relevant content, not a full-text representation of any specific 
>   content item.
>


philosophically, we are on the same page.  Practically, we have to deal 
with the way search engines work.  For the most part, if the query 
matches text on a particular page X, then when the user selects that 
result from the query results, they navigate to page X.  So, if I have 
one page with the content of my exhibit, I can get it indexed by putting 
a link to it, but now, if the user does a search and finds that page, 
they will reach the page of content, when really I want them to reach 
the exhibit presentation page that displays that content nicely.


> By "stuffing all publications into Exhibit" value is being added to the 
> organization and characterization of that content.  It's a value add, 
> not a replacement.  And it in no manner keeps the underlying content 
> hidden.  (It just doesn't in-and-of-itself reveal it, but need it do so?)
>   
In an ideal world there would be a way to tell a search engine "when 
content on page X matches a query, take the user to page Y".  But an 
ideal world doesn't have spammers.  They love pulling this kind of bait 
and switch, so even if we find a way to do it, it makes us look like 
spammers which is a bad way to get indexed.  My belief is that search 
engines want the page they see to be as much as possible like the page 
their users will see.  Thus, the embedding of the data in the page 
(somehow).
> Thus, if I want the most meaningful entry point to ALL of the 
> professor's publications, I want to go to Exhibit.  If I'm looking for 
> foobar and widget which might be mentioned in one of the publications 
> (as well as from potentially many others), I go to Google or do a local 
> site search.
>   
True if I know exactly which paper I am looking for.   But we academics 
kind of like the idea of the sum total of our publications serving to 
attract queries in the general area of our research. 
> The real issue I was trying to get at was the importance of describing 
> the purpose of the Exhibit in the first place so that it can be 
> adequately discovered by Google to fulfill its unique purpose.  An 
> Exhibit plopped on a page without lead-in or explanation will remain 
> "invisible" to search engines and therefore not usable for its real 
> purpose.
>   
Not if the data is visible.  In general, lots of data sets are 
reasonably "self describing" without further annotation of what they are 
about.
> The reason I got so excited about Exhibit in the first place was that it 
> is a simple and easily implemented way to layer a structured 
> characterization over relevant content.  In other words, it adds logic 
> and organization to datasets of content (which, later on, can be subject 
> to semantic mediation).  Way cool.
>
> If the underlying content has value in its own right, put it somewhere 
> in a static link for its own discovery.  That (IMHO) is not the purpose 
> or unique innovation of Exhibit.
>   
Ah, but now I will have to make a static page for each piece of 
content.  And I lose the "aggregate" page which may be a much better 
match than any individual objects.

I am actually operating between worlds like this right now.  My 
publications page remains static but it has a link to an exhibitified 
version.  This so I can remain searchable plus get google benefits.  But 
it is clearly a second best solution:
* a visitor has to notice and click on the searchable page to use the 
exhibit
* the static page is completely out of date, so I would rather google 
used the up to date xibit
* it's inelegant.
> What is so remarkably impressive about so many of the Simile and MIT 
> projects is that they are providing, brick-by-brick, needed parts to the 
> Web's emerging semantic foundation.  What (IMHO) is truly needed now is 
> more glue and mortar.  (Despite the fact I will ask for more features on 
> occasion! :) Oh well, foolish consistency is the hobgoblin of small minds!)
>
> The combination of PB, Solvent, RDFizers, Babel, Exhibit, Sifter and 
> many, many others (I am still learning about) is where I hope the next 
> major thrust occurs.  What is the glue that will tie all of this 
> together?  Does JSONP represent the lowest common denominator canonical 
> data format at the core of the client side?  Should client side use 
> SQLite and Firefox with Java-based RDF triplestores on the server side 
> for scalable collaboration?  Is it too early to talk about architecture, 
> piece parts and modularity?  And what are the best pieces?  I certainly 
> think Exhibit has the potential to be one of them.
>
> Sorry for heading off on a rant.  Again, thanks to all for producing 
> such fine work, and I in no manner want to get discussions off track (so 
> I will shut up!).  I really am excited about all of this stuff . . . .
>
> Thanks, Mike
>
> David Huynh wrote:
>   
>> Hi Mike,
>>
>> People who have seen Exhibit very often ask if Google would see their 
>> data. To convince a professor to stuff all of her publications into 
>> Exhibit, we need to assure her that her publications are still 
>> searchable because they are in their current form.
>>
>> Thank you for the link to Google sitemap. I can't seem to find out how 
>> to make Google index exhibit json data linked from an html page as if 
>> the data itself were inside that html page. (This is because I don't 
>> want Google searchers to end up looking at the json rather than the 
>> html.) Do you know how?
>>
>>     
> _______________________________________________
> General mailing list
> [email protected]
> http://simile.mit.edu/mailman/listinfo/general
>   
_______________________________________________
General mailing list
[email protected]
http://simile.mit.edu/mailman/listinfo/general

Re: Exhibits invisible to Google

Reply via email to