Thanks Ian,

On Mar 4, 2008, at 4:35 PM, Ian Boston wrote:

> Mark,
> I am not not a Longwell developer and my comment is a diversion from
> the core of your question, however using collected metadata for the
> disambiguation of Author names has a significant amount of merit,
> provided that the corpus of information relating to the author in
> question contains a small enough range of name variants relative to
> the size of collected information.
> By collected information I mean publications and associated metadata
> that increase the level of confidence that 2 names are in-fact the
> same author. I was at a CNI Workshop in Washington at the end of last
> month where this was discussed, and IMVHO (:)) provided that there is
> some control over the generation potential names, a sufficiently
> populated tripple store will contain the information and the
> capabilities to generate links or additional relationships between
> names with a level of confidence that they are the same. Obviously
> Longwells term vectors (I did get that right, it has term vectors
> doesn't it ?) may be able to cluster an authors style and language to
> add to the confidence.
>
> Another factor that might help you in your search is other identity
> references that might not originate from within your own store of
> names pointers, but be represented as external URI's with a level of
> trust/confidence, eg OpenSocial FOAF stores, OpenID etc.
>
> As I said, not a Longwell developer, but an interested observer.
> Ian


I think the community is banking on what your suggesting being the  
case... at least if this diagram is accurate concerning Linked  
Data... ;-)

http://en.wikipedia.org/wiki/Image:Linking-Open-Data-diagram_2007-09.png

But, I still ponder that our current case seems "misapplied".  We  
loosely manage a set of content in one DSpace instance, if we could  
just get it under better management and correct the variation at its  
source, we may be better off than doing it using inference. We are  
investing allot to make disambiguities and equivalences on data for  
which we already control the publication of.  Rather than have it  
isolated to some "presentation layer", I want the result of that  
effort to "stick", I 'd rather see the content corrected at its  
source.  I feel that inference just isn't a replacement for better  
management of the metadata within our own system.

Using these sort of equivalencies seems much more applicable to cases  
where you do not actually manage the original data, not as a  
replacement for actively maintaining and cleaning up ones data  
locally in a controlled service such as DSpace.  It would be great if  
such identity references/services could ultimately be used as  
authorities to assist curators in cleaning up their metadata (or in  
reducing the variance coming from users in the first place).   If  
proper feedbacks could be created in a tool such as dspace to place  
authority controls (or at least suggested values) on such metadata  
during submission/update, I think they would have allot of "traction"  
in the community.

Cheers,
Mark
_______________________________________________
General mailing list
[email protected]
http://simile.mit.edu/mailman/listinfo/general

Reply via email to