Hi Katie. I don't think there are universally agreed best practices in this
space and people often have strongly held views on either side. You don't
mention internationalization/localization which is, in my experience, a
bigger concern for folks than semantic drift. Those who believe in numeric
identifiers often think that using identifiers in a given natural language
provides that language an undeserved pride of place and priority over other
languages. Folks in this camp include the creators of CIDOC and there are
people dismayed by BibFrame's abandonment of MARC-style numbers.
>From a practical point of view, numeric identifiers, while perfectly
sensible in the abstract, suffer from the weak tools that we have, so end
up disadvantaging everyone equally, but everyone more than English
identifiers probably would.
Your note implies that concept URIs could change over time if they had
natural language words as part of the URI. I don't think this would be a
good practice. If UAT:Black now means "orange," I think you need to either
live with UAT:Black as the URI, mint a synonym UAT:Orange (and keep
UAT:Black), or deprecate UAT:Black as a valid concept and create a new
concept UAT:Orange. Which course of action is most appropriate will depend
on the specific circumstances of a change. If you decide there's a new
concept UAT:DarkGrey, that is split off from UAT:Black, perhaps the
original can exist unchanged, but if you decide that there's really no such
thing as "black" but just UAT:DarkGrey and UAT:DarkestGrey, then perhaps
UAT:Black gets deprecated and removed. Changing the pieces of URI to
UAT101, UAT102, UAT301, etc doesn't really affect most of the discussion.
The only case it makes easier is avoid UAT:Black having a description of
"vibrant orange," if the concept drifts far enough from its original label
(which is embedded in the URI).
Since Dimitris mentioned Freebase, briefly what they did was initially mint
English language URIs based on the label of the topic, but eventually
abandoned the practice because it was too difficult to do automatically and
added too little value. They did keep English identifiers for types &
properties which were part of the scheme, but these were hand assigned and
provided a useful organizing function to group properties with the
associated type, types with their domain, etc. A powerful feature of the
Freebase setup was that a single topic could have arbitrarily many URIs, so
dereferencing /en/Boston, /authority/viaf/1234,
/authority/loc/lcnam/nm1234, /wikipedia/en_title/Boston (city), etc could
all fetch the same the same content (without the use of redirects). The
core identifiers for non-schema topics were machine generated sequential
IDs encoded with a compact base 37(?) encoding, e.g. /m/0d_23
Tom
p.s. I'm a couple of blocks away if you want to chat about this stuff some
time.
On Thu, May 26, 2016 at 2:43 PM, Katie Frey <kf...@cfa.harvard.edu> wrote:
> Hello,
>
> How are concept IDs handled for DBpedia? It looks like the concept URIs
> are descriptive (i.e. for the concept http://dbpedia.org/page/Solar_System,
> the concept ID is "Solar_System"). Are the descriptive IDs used throughout
> all of dbpedia (back and front end) or are terms ultimately kept unique by
> using numeric identifiers?
>
> I've been developing a controlled vocabulary and I would also like to use
> URIs so that my terms can be used with other linked data schemes. My group
> and I have had a lot of discussions regarding the concept IDs; some want
> them to be descriptive, based on the preferred term for each concept so
> that they are human readable but this could cause problems if the terms
> used to describe each concept change over time, others want them to be
> randomly generated so that if the description of a term drifts over time
> the URI for the concept will always remain static.
>
> We are trying to figure out if there are any standards or best practices
> we should be looking towards when it comes to concept IDs. Any
> thoughts/comments/justifications would be appreciated.
>
> Best,
> Katie
>
> --
> Katie E. Frey
> John G. Wolbach Library, Harvard-Smithsonian Center for Astrophysics
> 60 Garden Street, MS-56, Cambridge, MA 02138
> email: kf...@cfa.harvard.edu | phone: 617-496-7579
> http://astrothesaurus.org | http://library.cfa.harvard.edu/
>
> "Surprising what you can dig out of books if you read long enough, isn’t
> it?"
> - Rand al'Thor (in Robert Jordan's The Shadow Rising, Book Four of the
> Wheel of Time)
>
> "This is insanity!" "No, this is scholarship!"
> - Yalb and Shallan (in Brandon Sanderson's Words of Radiance, Book Two of
> the Stormlight Archive)
>
>
> ------------------------------------------------------------------------------
> Mobile security can be enabling, not merely restricting. Employees who
> bring their own devices (BYOD) to work are irked by the imposition of MDM
> restrictions. Mobile Device Manager Plus allows you to control only the
> apps on BYO-devices by containerizing them, leaving personal data
> untouched!
> https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
> _______________________________________________
> Dbpedia-discussion mailing list
> Dbpedia-discussion@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>
>
------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are
consuming the most bandwidth. Provides multi-vendor support for NetFlow,
J-Flow, sFlow and other flows. Make informed decisions using capacity
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion