[CODE4LIB] You got it!!!!! Re: [CODE4LIB] Something completely different

2009-04-16 Thread Mike Taylor
Peter Schlumpf writes:
  Bill,
  
  You have hit the nail on the head!  This is EXACTLY what I am
  trying to do!  It's the underlying stuff that I am trying to get
  at.  Looking at RDF may yield some good ideas.  But I am not
  thinking in terms of RDF or XML, triples, or MARC, standards, or
  any of that stuff that gets thrown around here.  Even the Internet
  is not terribly necessary.  I am thinking in terms of data
  structures, pointers, sparse matrices, relationships between
  objects and yes, set theory too -- things like that.  The former is
  pretty much cruft that lies upon the latter, and it mostly just
  gets in the way.  Noise, as you put it, Bill!
  
  A big problem here is that Libraryland has a bad habit of getting
  itself lost in the details and going off on all kinds of tangents.
  As I said before, the biggest prison is between the ears  Throw
  out all that junk in there and just start over!  When I begin
  programming this thing my only tools will be a programming language
  (C or Java) a text editor (vi) and my head.

This is very idyllic and (I hope this doesn't sound too patronising)
probably necessary from time to time.  But I've seen too many
initiatives like this that start out making huge conceptual strides
and then start tripping over all those gushdurned DETAILS.  I think
it's disingenuous to talk as though the details aren't important: 90%
of every project is the details, and while the other 10% is the fun
part, building new conceptual frameworks usually seems to involve
throwing out all the accumulated crud, which -- guess what? -- turns
out to be the embodiment in code of accumulated wisdom.  Babies,
bathwater, all that ... except that the bathwater turns out to be made
of millions of tiny babies, and -- what's that you say?  My metaphor
has skidded off the track?  Oh well.

 _/|____
/o ) \/  Mike Taylorm...@indexdata.comhttp://www.miketaylor.org.uk
)_v__/\  Are you suggesting that coconuts migrate? -- Monty Python and
 the Holy Grail.


Re: [CODE4LIB] You got it!!!!! Re: [CODE4LIB] Something completely different

2009-04-10 Thread Han, Yan
Bill and Peter,

Very nice posts. XML, RDF, MARC and DC are all different ways to present 
information in a way (of course, XML, RDF, and DC are easier to read/processed 
by machine). 

However, down the fundamentals, I think that it can go deeper, basically data 
structure and algorithms making things works. RDF (with triples) is a directed 
graph. Graph is a powerful (the most powerful?) data structure that you can 
model everything. However, some of the graph theory/problems are NP-hard 
problems. In fundamental we are talking about Math. So a balance needs to be 
made. (between how complex the model is and how easy(or possible) to get it 
implemented). As computing power grows, complex data modeling and data mining 
are on the horizon.

Yan

-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Peter 
Schlumpf
Sent: Thursday, April 09, 2009 10:09 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] You got it! Re: [CODE4LIB] Something completely 
different

Bill,

You have hit the nail on the head!  This is EXACTLY what I am trying to do! 
 It's the underlying stuff that I am trying to get at.   Looking at RDF may 
yield some good ideas.  But I am not thinking in terms of RDF or XML, triples, 
or MARC, standards, or any of that stuff that gets thrown around here.  Even 
the Internet is not terribly necessary.  I am thinking in terms of data 
structures, pointers, sparse matrices, relationships between objects and yes, 
set theory too -- things like that.  The former is pretty much cruft that lies 
upon the latter, and it mostly just gets in the way.  Noise, as you put it, 
Bill!

A big problem here is that Libraryland has a bad habit of getting itself lost 
in the details and going off on all kinds of tangents.  As I said before, the 
biggest prison is between the ears  Throw out all that junk in there and 
just start over!  When I begin programming this thing my only tools will be a 
programming language (C or Java) a text editor (vi) and my head.  But before I 
really start that, right now I am writing a paper that explains how this stuff 
works at a very low level.  It's mostly an effort to get my thoughts down 
clearly, but I will share a draft of it with y'all on here soon.

Peter Schlumpf


-Original Message-
From: Bill Dueber b...@dueber.com
Sent: Apr 9, 2009 10:37 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Something completely different

On Thu, Apr 9, 2009 at 10:26 AM, Mike Taylor m...@indexdata.com wrote:

 I'm not sure what to make of this except to say that Yet Another XML
 Bibliographic Format is NOT the answer!


I recognize that you're being flippant, and yet think there's an important
nugget in here.

When you say it that way, it makes it sound as if folks are debating the
finer points of OAI-MARC vs MARC-XML -- that it's simply syntactic sugar
(although I'm certainly one to argue for the importance of syntactic sugar)
over the top of what we already have.

What's actually being discussed, of course, is the underlying data model.
E-R pairs primarily analyzed by set theory, triples forming directed graphs,
whether or not links between data elements can themselves have attributes --
these are all possible characteristics of the fundamental underpinning of a
data model to describe the data we're concerned with.

The fact that they all have common XML representations is noise, and
referencing the currently-most-common xml schema for these things is just
convenient shorthand in a community that understands the exemplars. The fact
that many in the library community don't understand that syntax is not the
same as a data model is how we ended up with RDA.  (Mike: I don't know your
stuff, but I seriously doubt you're among that group. I'm talkin' in
general, here.)

Bibliographic data is astoundingly complex, and I believe wholeheartedly
that modeling it sufficiently is a very, very hard task. But no matter the
underlying model, we should still insist on starting with the basics that
computer science folks have been using for decades now: uids  (and, these
days, guids) for the important attributes, separation of data and display,
definition of sufficient data types and reuse of those types whenever
possible, separation of identity and value, full normalization of data, zero
ambiguity in the relationship diagram as a fundamental tenet, and a rigorous
mathematical model to describe how it all fits together.

This is hard stuff. But it's worth doing right.




-- 
Bill Dueber
Library Systems Programmer
University of Michigan Library


Re: [CODE4LIB] You got it!!!!! Re: [CODE4LIB] Something completely different

2009-04-10 Thread Casey A Mullin

(Attention: lurker emerging)

To me what it comes down to is neither simplicity nor complexity, but 
extensibility. In a perfect world, our data models should be capable of 
representing very sophisticated and robust relationships at a high level 
of granularity, while still accommodating ease of metadata production 
and contribution (especially by non-experts and those outside the 
library community).


I agree that none of our existing data structures/syntaxes are /a priori 
/fundamental or infallible. But what is promising to me about RDF is its 
intuitive mode of expression and extensibility (exactly the kind I 
advocate above).


Casey

Han, Yan wrote:

Bill and Peter,

Very nice posts. XML, RDF, MARC and DC are all different ways to present information in a way (of course, XML, RDF, and DC are easier to read/processed by machine). 


However, down the fundamentals, I think that it can go deeper, basically data 
structure and algorithms making things works. RDF (with triples) is a directed 
graph. Graph is a powerful (the most powerful?) data structure that you can 
model everything. However, some of the graph theory/problems are NP-hard 
problems. In fundamental we are talking about Math. So a balance needs to be 
made. (between how complex the model is and how easy(or possible) to get it 
implemented). As computing power grows, complex data modeling and data mining 
are on the horizon.

Yan

-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Peter 
Schlumpf
Sent: Thursday, April 09, 2009 10:09 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] You got it! Re: [CODE4LIB] Something completely 
different

Bill,

You have hit the nail on the head!  This is EXACTLY what I am trying to do! 
 It's the underlying stuff that I am trying to get at.   Looking at RDF may 
yield some good ideas.  But I am not thinking in terms of RDF or XML, triples, 
or MARC, standards, or any of that stuff that gets thrown around here.  Even 
the Internet is not terribly necessary.  I am thinking in terms of data 
structures, pointers, sparse matrices, relationships between objects and yes, 
set theory too -- things like that.  The former is pretty much cruft that lies 
upon the latter, and it mostly just gets in the way.  Noise, as you put it, 
Bill!

A big problem here is that Libraryland has a bad habit of getting itself lost 
in the details and going off on all kinds of tangents.  As I said before, the 
biggest prison is between the ears  Throw out all that junk in there and 
just start over!  When I begin programming this thing my only tools will be a 
programming language (C or Java) a text editor (vi) and my head.  But before I 
really start that, right now I am writing a paper that explains how this stuff 
works at a very low level.  It's mostly an effort to get my thoughts down 
clearly, but I will share a draft of it with y'all on here soon.

Peter Schlumpf


-Original Message-
  

From: Bill Dueber b...@dueber.com
Sent: Apr 9, 2009 10:37 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Something completely different

On Thu, Apr 9, 2009 at 10:26 AM, Mike Taylor m...@indexdata.com wrote:



I'm not sure what to make of this except to say that Yet Another XML
Bibliographic Format is NOT the answer!

  

I recognize that you're being flippant, and yet think there's an important
nugget in here.

When you say it that way, it makes it sound as if folks are debating the
finer points of OAI-MARC vs MARC-XML -- that it's simply syntactic sugar
(although I'm certainly one to argue for the importance of syntactic sugar)
over the top of what we already have.

What's actually being discussed, of course, is the underlying data model.
E-R pairs primarily analyzed by set theory, triples forming directed graphs,
whether or not links between data elements can themselves have attributes --
these are all possible characteristics of the fundamental underpinning of a
data model to describe the data we're concerned with.

The fact that they all have common XML representations is noise, and
referencing the currently-most-common xml schema for these things is just
convenient shorthand in a community that understands the exemplars. The fact
that many in the library community don't understand that syntax is not the
same as a data model is how we ended up with RDA.  (Mike: I don't know your
stuff, but I seriously doubt you're among that group. I'm talkin' in
general, here.)

Bibliographic data is astoundingly complex, and I believe wholeheartedly
that modeling it sufficiently is a very, very hard task. But no matter the
underlying model, we should still insist on starting with the basics that
computer science folks have been using for decades now: uids  (and, these
days, guids) for the important attributes, separation of data and display,
definition of sufficient data types and reuse of those types whenever
possible, separation of identity and value, full

Re: [CODE4LIB] You got it!!!!! Re: [CODE4LIB] Something completely different

2009-04-10 Thread Karen Coyle
Extensibility as absolutely key. I know that some people consider XML to 
be inherently extensible, but I'm concerned that the conceptual model 
presented by FRBR doesn't support extensibility. For example, the FRBR 
entity Place represents only the place as a subject. If you want to 
represent places anywhere else in the record, you are SOL. Ditto the 
Event entity. The attributes in FRBR have no inherent structure, so 
you have, say, Manifestation with a whole page of attributes that are 
each defined at the most detailed level. You have reduction ratio 
(microform) but no reproduction info field that you could extend for 
another physical format. You have date of publication but no general 
date property that could be extended to other dates that are needed 
(in fact, the various date fields have no relation to each other).


To have an extensible data structure we need to have some foundation 
classes that we can build on, and nothing in FRBR, RDA, or MARC gives 
us that.


kc

Casey A Mullin wrote:

(Attention: lurker emerging)

To me what it comes down to is neither simplicity nor complexity, but 
extensibility. In a perfect world, our data models should be capable 
of representing very sophisticated and robust relationships at a high 
level of granularity, while still accommodating ease of metadata 
production and contribution (especially by non-experts and those 
outside the library community).


I agree that none of our existing data structures/syntaxes are /a 
priori /fundamental or infallible. But what is promising to me about 
RDF is its intuitive mode of expression and extensibility (exactly the 
kind I advocate above).


Casey

Han, Yan wrote:

Bill and Peter,

Very nice posts. XML, RDF, MARC and DC are all different ways to 
present information in a way (of course, XML, RDF, and DC are easier 
to read/processed by machine).
However, down the fundamentals, I think that it can go deeper, 
basically data structure and algorithms making things works. RDF 
(with triples) is a directed graph. Graph is a powerful (the most 
powerful?) data structure that you can model everything. However, 
some of the graph theory/problems are NP-hard problems. In 
fundamental we are talking about Math. So a balance needs to be made. 
(between how complex the model is and how easy(or possible) to get it 
implemented). As computing power grows, complex data modeling and 
data mining are on the horizon.


Yan

-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf 
Of Peter Schlumpf

Sent: Thursday, April 09, 2009 10:09 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] You got it! Re: [CODE4LIB] Something 
completely different


Bill,

You have hit the nail on the head!  This is EXACTLY what I am 
trying to do!  It's the underlying stuff that I am trying to get 
at.   Looking at RDF may yield some good ideas.  But I am not 
thinking in terms of RDF or XML, triples, or MARC, standards, or any 
of that stuff that gets thrown around here.  Even the Internet is not 
terribly necessary.  I am thinking in terms of data structures, 
pointers, sparse matrices, relationships between objects and yes, set 
theory too -- things like that.  The former is pretty much cruft that 
lies upon the latter, and it mostly just gets in the way.  Noise, as 
you put it, Bill!


A big problem here is that Libraryland has a bad habit of getting 
itself lost in the details and going off on all kinds of tangents.  
As I said before, the biggest prison is between the ears  Throw 
out all that junk in there and just start over!  When I begin 
programming this thing my only tools will be a programming language 
(C or Java) a text editor (vi) and my head.  But before I really 
start that, right now I am writing a paper that explains how this 
stuff works at a very low level.  It's mostly an effort to get my 
thoughts down clearly, but I will share a draft of it with y'all on 
here soon.


Peter Schlumpf


-Original Message-
 

From: Bill Dueber b...@dueber.com
Sent: Apr 9, 2009 10:37 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Something completely different

On Thu, Apr 9, 2009 at 10:26 AM, Mike Taylor m...@indexdata.com 
wrote:


   

I'm not sure what to make of this except to say that Yet Another XML
Bibliographic Format is NOT the answer!

  
I recognize that you're being flippant, and yet think there's an 
important

nugget in here.

When you say it that way, it makes it sound as if folks are debating 
the
finer points of OAI-MARC vs MARC-XML -- that it's simply syntactic 
sugar
(although I'm certainly one to argue for the importance of syntactic 
sugar)

over the top of what we already have.

What's actually being discussed, of course, is the underlying data 
model.
E-R pairs primarily analyzed by set theory, triples forming directed 
graphs,
whether or not links between data elements can themselves have 
attributes --
these are all possible characteristics

Re: [CODE4LIB] You got it!!!!! Re: [CODE4LIB] Something completely different

2009-04-10 Thread Casey A Mullin
I completely agree with Karen regarding how FRBR falls short in not 
allowing for more relationships between Group 1-2 and Group 3 entities. 
FRBRoo fleshes out some of these things, but in a woefully unweildy way, 
IMO. Conversely, FRBR in RDF (at http://vocab.org/frbr) consolidates 
some classes and properties (e.g. Responsible entity, a superclass of 
Person, Family and Corporate body), and to me approaches the kind of 
extensibility we need. Unfortunately, it does not include data 
properties, which I agree are problematic, as Karen illustrates.


I do maintain that FRBR is the kind of *conceptual* model that, for the 
most part, can guide the development of effective data structures. 
However, it is far too abstract to be implemented verbatim. This is what 
I think RDA is trying to do with attributes like Title for the work I 
wonder: why is there not an ontology expert on the JSC?? (If I'm wrong 
and there is, someone please correct me)


Casey

Karen Coyle wrote:
Extensibility as absolutely key. I know that some people consider XML 
to be inherently extensible, but I'm concerned that the conceptual 
model presented by FRBR doesn't support extensibility. For example, 
the FRBR entity Place represents only the place as a subject. If you 
want to represent places anywhere else in the record, you are SOL. 
Ditto the Event entity. The attributes in FRBR have no inherent 
structure, so you have, say, Manifestation with a whole page of 
attributes that are each defined at the most detailed level. You have 
reduction ratio (microform) but no reproduction info field that 
you could extend for another physical format. You have date of 
publication but no general date property that could be extended to 
other dates that are needed (in fact, the various date fields have no 
relation to each other).


To have an extensible data structure we need to have some foundation 
classes that we can build on, and nothing in FRBR, RDA, or MARC 
gives us that.


kc

Casey A Mullin wrote:

(Attention: lurker emerging)

To me what it comes down to is neither simplicity nor complexity, but 
extensibility. In a perfect world, our data models should be capable 
of representing very sophisticated and robust relationships at a high 
level of granularity, while still accommodating ease of metadata 
production and contribution (especially by non-experts and those 
outside the library community).


I agree that none of our existing data structures/syntaxes are /a 
priori /fundamental or infallible. But what is promising to me about 
RDF is its intuitive mode of expression and extensibility (exactly 
the kind I advocate above).


Casey

Han, Yan wrote:

Bill and Peter,

Very nice posts. XML, RDF, MARC and DC are all different ways to 
present information in a way (of course, XML, RDF, and DC are easier 
to read/processed by machine).
However, down the fundamentals, I think that it can go deeper, 
basically data structure and algorithms making things works. RDF 
(with triples) is a directed graph. Graph is a powerful (the most 
powerful?) data structure that you can model everything. However, 
some of the graph theory/problems are NP-hard problems. In 
fundamental we are talking about Math. So a balance needs to be 
made. (between how complex the model is and how easy(or possible) to 
get it implemented). As computing power grows, complex data modeling 
and data mining are on the horizon.


Yan

-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf 
Of Peter Schlumpf

Sent: Thursday, April 09, 2009 10:09 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] You got it! Re: [CODE4LIB] Something 
completely different


Bill,

You have hit the nail on the head!  This is EXACTLY what I am 
trying to do!  It's the underlying stuff that I am trying to get 
at.   Looking at RDF may yield some good ideas.  But I am not 
thinking in terms of RDF or XML, triples, or MARC, standards, or any 
of that stuff that gets thrown around here.  Even the Internet is 
not terribly necessary.  I am thinking in terms of data structures, 
pointers, sparse matrices, relationships between objects and yes, 
set theory too -- things like that.  The former is pretty much cruft 
that lies upon the latter, and it mostly just gets in the way.  
Noise, as you put it, Bill!


A big problem here is that Libraryland has a bad habit of getting 
itself lost in the details and going off on all kinds of tangents.  
As I said before, the biggest prison is between the ears  Throw 
out all that junk in there and just start over!  When I begin 
programming this thing my only tools will be a programming language 
(C or Java) a text editor (vi) and my head.  But before I really 
start that, right now I am writing a paper that explains how this 
stuff works at a very low level.  It's mostly an effort to get my 
thoughts down clearly, but I will share a draft of it with y'all on 
here soon.


Peter Schlumpf


-Original Message

[CODE4LIB] You got it!!!!! Re: [CODE4LIB] Something completely different

2009-04-09 Thread Peter Schlumpf
Bill,

You have hit the nail on the head!  This is EXACTLY what I am trying to do! 
 It's the underlying stuff that I am trying to get at.   Looking at RDF may 
yield some good ideas.  But I am not thinking in terms of RDF or XML, triples, 
or MARC, standards, or any of that stuff that gets thrown around here.  Even 
the Internet is not terribly necessary.  I am thinking in terms of data 
structures, pointers, sparse matrices, relationships between objects and yes, 
set theory too -- things like that.  The former is pretty much cruft that lies 
upon the latter, and it mostly just gets in the way.  Noise, as you put it, 
Bill!

A big problem here is that Libraryland has a bad habit of getting itself lost 
in the details and going off on all kinds of tangents.  As I said before, the 
biggest prison is between the ears  Throw out all that junk in there and 
just start over!  When I begin programming this thing my only tools will be a 
programming language (C or Java) a text editor (vi) and my head.  But before I 
really start that, right now I am writing a paper that explains how this stuff 
works at a very low level.  It's mostly an effort to get my thoughts down 
clearly, but I will share a draft of it with y'all on here soon.

Peter Schlumpf


-Original Message-
From: Bill Dueber b...@dueber.com
Sent: Apr 9, 2009 10:37 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Something completely different

On Thu, Apr 9, 2009 at 10:26 AM, Mike Taylor m...@indexdata.com wrote:

 I'm not sure what to make of this except to say that Yet Another XML
 Bibliographic Format is NOT the answer!


I recognize that you're being flippant, and yet think there's an important
nugget in here.

When you say it that way, it makes it sound as if folks are debating the
finer points of OAI-MARC vs MARC-XML -- that it's simply syntactic sugar
(although I'm certainly one to argue for the importance of syntactic sugar)
over the top of what we already have.

What's actually being discussed, of course, is the underlying data model.
E-R pairs primarily analyzed by set theory, triples forming directed graphs,
whether or not links between data elements can themselves have attributes --
these are all possible characteristics of the fundamental underpinning of a
data model to describe the data we're concerned with.

The fact that they all have common XML representations is noise, and
referencing the currently-most-common xml schema for these things is just
convenient shorthand in a community that understands the exemplars. The fact
that many in the library community don't understand that syntax is not the
same as a data model is how we ended up with RDA.  (Mike: I don't know your
stuff, but I seriously doubt you're among that group. I'm talkin' in
general, here.)

Bibliographic data is astoundingly complex, and I believe wholeheartedly
that modeling it sufficiently is a very, very hard task. But no matter the
underlying model, we should still insist on starting with the basics that
computer science folks have been using for decades now: uids  (and, these
days, guids) for the important attributes, separation of data and display,
definition of sufficient data types and reuse of those types whenever
possible, separation of identity and value, full normalization of data, zero
ambiguity in the relationship diagram as a fundamental tenet, and a rigorous
mathematical model to describe how it all fits together.

This is hard stuff. But it's worth doing right.




-- 
Bill Dueber
Library Systems Programmer
University of Michigan Library