Thanks Ian, On Mar 4, 2008, at 4:35 PM, Ian Boston wrote:
> Mark, > I am not not a Longwell developer and my comment is a diversion from > the core of your question, however using collected metadata for the > disambiguation of Author names has a significant amount of merit, > provided that the corpus of information relating to the author in > question contains a small enough range of name variants relative to > the size of collected information. > By collected information I mean publications and associated metadata > that increase the level of confidence that 2 names are in-fact the > same author. I was at a CNI Workshop in Washington at the end of last > month where this was discussed, and IMVHO (:)) provided that there is > some control over the generation potential names, a sufficiently > populated tripple store will contain the information and the > capabilities to generate links or additional relationships between > names with a level of confidence that they are the same. Obviously > Longwells term vectors (I did get that right, it has term vectors > doesn't it ?) may be able to cluster an authors style and language to > add to the confidence. > > Another factor that might help you in your search is other identity > references that might not originate from within your own store of > names pointers, but be represented as external URI's with a level of > trust/confidence, eg OpenSocial FOAF stores, OpenID etc. > > As I said, not a Longwell developer, but an interested observer. > Ian I think the community is banking on what your suggesting being the case... at least if this diagram is accurate concerning Linked Data... ;-) http://en.wikipedia.org/wiki/Image:Linking-Open-Data-diagram_2007-09.png But, I still ponder that our current case seems "misapplied". We loosely manage a set of content in one DSpace instance, if we could just get it under better management and correct the variation at its source, we may be better off than doing it using inference. We are investing allot to make disambiguities and equivalences on data for which we already control the publication of. Rather than have it isolated to some "presentation layer", I want the result of that effort to "stick", I 'd rather see the content corrected at its source. I feel that inference just isn't a replacement for better management of the metadata within our own system. Using these sort of equivalencies seems much more applicable to cases where you do not actually manage the original data, not as a replacement for actively maintaining and cleaning up ones data locally in a controlled service such as DSpace. It would be great if such identity references/services could ultimately be used as authorities to assist curators in cleaning up their metadata (or in reducing the variance coming from users in the first place). If proper feedbacks could be created in a tool such as dspace to place authority controls (or at least suggested values) on such metadata during submission/update, I think they would have allot of "traction" in the community. Cheers, Mark _______________________________________________ General mailing list [email protected] http://simile.mit.edu/mailman/listinfo/general
