[CODE4LIB] How to measure quality of a record

2015-05-06 Thread Sergio Letuche
Hello community,

is there a way, any statistical approach, that you are aware of that let's
say, allows one to have an idea of how complete a record is, or what are
the actions you take in order to have an idea of the quality of a record,
and eventually a database?

Thank you in advance


Re: [CODE4LIB] Library Hours

2015-05-06 Thread Charlie Morris
I'm curious, Karen, Ethan or anyone else, do you know of any examples of
libraries that have implemented schema.org or RDFa for hours data and have
noticed that Google or some other search engine has picked it up (i.e.,
correctly displaying that data as part of the search results)?  And if so,
how quickly will Google or the like pickup on changes to hours (i.e.,
shifting between semesters or unplanned changes)?

On Wed, May 6, 2015 at 8:15 AM, Ethan Gruber ewg4x...@gmail.com wrote:

 +1 on the RDFa and schema.org. For those that don't know the library URL
 off-hand, it is much easier to find a library website by Googling than it
 is to go through the central university portal, and the hours will show up
 at the top of the page after having been harvested by search engines.

 On Tue, May 5, 2015 at 6:54 PM, Karen Coyle li...@kcoyle.net wrote:

  Note that library hours is one of the possible bits of information that
  could be encoded as RDFa in the library web site, thus making it possible
  to derive library hours directly from the listing of hours on the web
 site
  rather than keeping a separate list. Schema.org does have the elements
 such
  that hours can be encoded. This would mean that hours could show in the
  display of the library's catalog entry on Google, Yahoo and Bing. Being
  available directly through the search engines might be sufficient, not
  necessitating creating yet-another-database for that data.
 
  Schema.org uses a restaurant as its opening hours example, but much of
 the
  data would be the same for a library:
 
  div vocab=http://schema.org/; typeof=Restaurant
span property=nameGreatFood/span
div property=aggregateRating  typeof=AggregateRating
  span property=ratingValue4/span stars -
  based on span property=reviewCount250/span reviews
/div
div property=address  typeof=PostalAddress
  span property=streetAddress1901 Lemur Ave/span
  span property=addressLocalitySunnyvale/span,
  span property=addressRegionCA/span span
  property=postalCode94086/span
/div
span property=telephone(408) 714-1489/span
a property=url href=http://www.dishdash.com;www.greatfood.com/a
Hours:
meta property=openingHours content=Mo-Sa 11:00-14:30Mon-Sat 11am
 -
  2:30pm
meta property=openingHours content=Mo-Th 17:00-21:30Mon-Thu 5pm -
  9:30pm
meta property=openingHours content=Fr-Sa 17:00-22:00Fri-Sat 5pm -
  10:00pm
Categories:
span property=servesCuisine
  Middle Eastern
/span,
span property=servesCuisine
  Mediterranean
/span
Price Range: span property=priceRange$$/span
Takes Reservations: Yes
  /div
 
  It seems to me that using schema.org would get more bang for the buck --
  it would get into the search engines and could also be aggregated into
  whatever database is needed. As we've seen with OCLC, having a separate
  listing is likely to mean that the data will be out of date.
 
  kc
 
  On 5/5/15 2:19 PM, nitin arora wrote:
 
  I can't see they distinguished between public libraries and other types
 on
  their campaign page.
 
  They say  all libraries as far as I can see.
  So I suppose then that this is true for all libraries:
  Libraries offer a space anyone can enter, where money isn't exchanged,
  and
  documentation doesn't have to be shown.
  Who knew fines and library/student-IDs were a thing of the past?
 
  The only data sets I can find where they got the 17,000 number is for
  public libraries:
  http://www.imls.gov/research/pls_data_files.aspx
  Maybe I missed something.
  There is an hours field on one of the CSVs I downloaded, etc for 2012
 data
  (the most recent I could find).
 
  Asking 10k for something targeted for completion in June and without a
  grasp on what types of libraries there are and how volatile the hours
  information is (especially in crisis) ...
  Sounds naive at best, sketchy at worst.
 
  The flexible funding button says this campaign will receive all funds
  raised even if it does not reach its goals.
 
  The value of these places for youth cannot be underestimated.
  So is the value of a quick buck ...
 
  On Tue, May 5, 2015 at 4:53 PM, McCanna, Terran 
  tmcca...@georgialibraries.org wrote:
 
   I'm not at all surprised that this doesn't already exist, and even if
  OCLC's was available, I'd be willing to bet it was out of date.
 
  Public library hours, especially in underfunded areas, may fluctuate
  depending on funding cycles, seasons (whether school is in or out),
 etc.,
  not to mention closing/reopening/moving because of old buildings that
  need
  to be updated. We have around 280 locations in our consortium and we
 have
  to rely on self-reporting to find out if their hours change. We
 certainly
  don't have staff time to check every one of their web sites on regular
  basis, I can't imagine keeping track of 17,000!
 
 
  Terran McCanna
  PINES Program Manager
  Georgia Public Library Service
  1800 Century Place, Suite 150
  Atlanta, GA 30345
  

Re: [CODE4LIB] How to measure quality of a record

2015-05-06 Thread Péter Király
Hi,

I thought a lot about this question in the past, and my answer is:
yes, you can apply statistical formulas. But you should know well each
field of your record: what kind of information could they contain,
whether you could set rules about that which you can apply for the
individual records. Some factors which are important:

- the completeness of the records: the ratio of the fields filled and unfilled
- the value of an individual field matches the rules or not (say you
expect a number in the range of 1 to 5, but you get 6)
- the probability that a given field value could be unique
- the probability that a record is not duplication of another record

Some concrete example from my Europeana past:
- there are mandatory fields, and if they are empty, the quality goes down
- there are fields which should match a known standard, for example
ISO language codes - you can apply rules to decide whether the value
fits or not
- the data provider field is a free text - no formal rule - but no
individual record could contain unique value, and when you import
several thousands of new record, they should not contain more than a
couple new values
- there are fields which should contain URLs or emails or dates, we
can check whether they fit for formal rules, and their content are in
a reasonable range (we should not have record created in the future
for example)
- you can measure whether the optional fields are fulfilled, and in which ratio

At the end you will have a couple of measurements, and you can apply
weighting to calculate a final classification number.

You can do a lot to set up rules with faceted search, and of course
you can use statistical tools, such as R, Julia which helps to get a
picture of distribution of the values.

Hope it helps.

Regards,
Péter

-- 
Péter Király
software developer

Göttingen Society for Scientific Data Processing - http://gwdg.de
eXtensible Catalog - http://eXtensibleCatalog.org


Re: [CODE4LIB] How to measure quality of a record

2015-05-06 Thread James Morley
I think a key thing is to determine to what extent any definition of 
'completeness' is actually a representation of 'quality'.  As Peter says, 
making sure not just that metadata is present but then checking it conforms 
with rules is a big step towards this. I would also extend this to assessing at 
what level of accuracy things have been set, for example dates (a rough range 
vs a precise day) and geotags (coordinates presenting the centre of Paris vs 
the exact position that a photograph was taken from). These sorts of things can 
make a big difference to both the discoverability and practical reusability of 
records by end users.

Best, James




From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] on behalf of Esmé Cowles 
[escow...@ticklefish.org]
Sent: 06 May 2015 13:51
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] How to measure quality of a record

Sergio-

Mark Phillips has a related blog post that I think is an excellent place to 
start, which outlines a system for scoring how complete a record is:

http://vphill.com/journal/post/4075

There was some discussion on twitter recently about this, which you can look up 
on the #metadataquality hashtag: https://twitter.com/hashtag/metadataquality

I think there was a move to setup a mailing list for this topic or something 
like that, but I'm not sure where that stands now.

-Esme

 On 05/06/15, at 7:21 AM, Sergio Letuche code4libus...@gmail.com wrote:

 Hello community,

 is there a way, any statistical approach, that you are aware of that let's
 say, allows one to have an idea of how complete a record is, or what are
 the actions you take in order to have an idea of the quality of a record,
 and eventually a database?

 Thank you in advance


Re: [CODE4LIB] Library Hours

2015-05-06 Thread Ethan Gruber
+1 on the RDFa and schema.org. For those that don't know the library URL
off-hand, it is much easier to find a library website by Googling than it
is to go through the central university portal, and the hours will show up
at the top of the page after having been harvested by search engines.

On Tue, May 5, 2015 at 6:54 PM, Karen Coyle li...@kcoyle.net wrote:

 Note that library hours is one of the possible bits of information that
 could be encoded as RDFa in the library web site, thus making it possible
 to derive library hours directly from the listing of hours on the web site
 rather than keeping a separate list. Schema.org does have the elements such
 that hours can be encoded. This would mean that hours could show in the
 display of the library's catalog entry on Google, Yahoo and Bing. Being
 available directly through the search engines might be sufficient, not
 necessitating creating yet-another-database for that data.

 Schema.org uses a restaurant as its opening hours example, but much of the
 data would be the same for a library:

 div vocab=http://schema.org/; typeof=Restaurant
   span property=nameGreatFood/span
   div property=aggregateRating  typeof=AggregateRating
 span property=ratingValue4/span stars -
 based on span property=reviewCount250/span reviews
   /div
   div property=address  typeof=PostalAddress
 span property=streetAddress1901 Lemur Ave/span
 span property=addressLocalitySunnyvale/span,
 span property=addressRegionCA/span span
 property=postalCode94086/span
   /div
   span property=telephone(408) 714-1489/span
   a property=url href=http://www.dishdash.com;www.greatfood.com/a
   Hours:
   meta property=openingHours content=Mo-Sa 11:00-14:30Mon-Sat 11am -
 2:30pm
   meta property=openingHours content=Mo-Th 17:00-21:30Mon-Thu 5pm -
 9:30pm
   meta property=openingHours content=Fr-Sa 17:00-22:00Fri-Sat 5pm -
 10:00pm
   Categories:
   span property=servesCuisine
 Middle Eastern
   /span,
   span property=servesCuisine
 Mediterranean
   /span
   Price Range: span property=priceRange$$/span
   Takes Reservations: Yes
 /div

 It seems to me that using schema.org would get more bang for the buck --
 it would get into the search engines and could also be aggregated into
 whatever database is needed. As we've seen with OCLC, having a separate
 listing is likely to mean that the data will be out of date.

 kc

 On 5/5/15 2:19 PM, nitin arora wrote:

 I can't see they distinguished between public libraries and other types on
 their campaign page.

 They say  all libraries as far as I can see.
 So I suppose then that this is true for all libraries:
 Libraries offer a space anyone can enter, where money isn't exchanged,
 and
 documentation doesn't have to be shown.
 Who knew fines and library/student-IDs were a thing of the past?

 The only data sets I can find where they got the 17,000 number is for
 public libraries:
 http://www.imls.gov/research/pls_data_files.aspx
 Maybe I missed something.
 There is an hours field on one of the CSVs I downloaded, etc for 2012 data
 (the most recent I could find).

 Asking 10k for something targeted for completion in June and without a
 grasp on what types of libraries there are and how volatile the hours
 information is (especially in crisis) ...
 Sounds naive at best, sketchy at worst.

 The flexible funding button says this campaign will receive all funds
 raised even if it does not reach its goals.

 The value of these places for youth cannot be underestimated.
 So is the value of a quick buck ...

 On Tue, May 5, 2015 at 4:53 PM, McCanna, Terran 
 tmcca...@georgialibraries.org wrote:

  I'm not at all surprised that this doesn't already exist, and even if
 OCLC's was available, I'd be willing to bet it was out of date.

 Public library hours, especially in underfunded areas, may fluctuate
 depending on funding cycles, seasons (whether school is in or out), etc.,
 not to mention closing/reopening/moving because of old buildings that
 need
 to be updated. We have around 280 locations in our consortium and we have
 to rely on self-reporting to find out if their hours change. We certainly
 don't have staff time to check every one of their web sites on regular
 basis, I can't imagine keeping track of 17,000!


 Terran McCanna
 PINES Program Manager
 Georgia Public Library Service
 1800 Century Place, Suite 150
 Atlanta, GA 30345
 404-235-7138
 tmcca...@georgialibraries.org


 - Original Message -
 From: Peter Murray jes...@dltj.org
 To: CODE4LIB@LISTSERV.ND.EDU
 Sent: Tuesday, May 5, 2015 4:36:56 PM
 Subject: Re: [CODE4LIB] Library Hours

 OCLC has an institutional registry [1], which had (in part) library
 hours,
 addresses, and so forth.  It seems to be unavailable, though [2].  That
 is
 the only systematic collection of library hours data that I know about.


 Peter

 [1] https://www.oclc.org/worldcat-registry.en.html
 [2] https://www.worldcat.org/registry/institution/

  On May 5, 2015, at 4:16 PM, Bigwood, 

Re: [CODE4LIB] How to measure quality of a record

2015-05-06 Thread Sergio Letuche
i felt i was missing something, since i could not find some general, most
used approach, and perhaps some code on github that implements these
quality measures...

2015-05-06 15:08 GMT+03:00 James Morley james.mor...@europeana.eu:

 I think a key thing is to determine to what extent any definition of
 'completeness' is actually a representation of 'quality'.  As Peter says,
 making sure not just that metadata is present but then checking it conforms
 with rules is a big step towards this. I would also extend this to
 assessing at what level of accuracy things have been set, for example dates
 (a rough range vs a precise day) and geotags (coordinates presenting the
 centre of Paris vs the exact position that a photograph was taken from).
 These sorts of things can make a big difference to both the discoverability
 and practical reusability of records by end users.

 Best, James



 
 From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] on behalf of Esmé
 Cowles [escow...@ticklefish.org]
 Sent: 06 May 2015 13:51
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] How to measure quality of a record

 Sergio-

 Mark Phillips has a related blog post that I think is an excellent place
 to start, which outlines a system for scoring how complete a record is:

 http://vphill.com/journal/post/4075

 There was some discussion on twitter recently about this, which you can
 look up on the #metadataquality hashtag:
 https://twitter.com/hashtag/metadataquality

 I think there was a move to setup a mailing list for this topic or
 something like that, but I'm not sure where that stands now.

 -Esme

  On 05/06/15, at 7:21 AM, Sergio Letuche code4libus...@gmail.com wrote:
 
  Hello community,
 
  is there a way, any statistical approach, that you are aware of that
 let's
  say, allows one to have an idea of how complete a record is, or what
 are
  the actions you take in order to have an idea of the quality of a record,
  and eventually a database?
 
  Thank you in advance



Re: [CODE4LIB] Library Hours

2015-05-06 Thread Karen Coyle
Charlie, I don't know of any libraries that have used schema.org for 
their web site - perhaps others do. If it is used, it should be picked 
up the next time the search engines index the site. What the search 
engines do with schema.org is not guaranteed, but can be observed. It is 
not guaranteed because none of the search engines will say what they do, 
as that is considered a trade secret (especially from each other).


However, as locations and hours are important for their commercial 
customers (stores, restaurants, etc.) I would expect that to be picked 
up as a matter of course. Note that already locations and hours for some 
businesses do show in the search engines, and that is for sites that are 
not yet using schema.org, so the engines have some way of picking that 
up from the HTML. The Google side-bar knowledge graph for my local 
libraries shows  Hours 
https://www.google.com/search?sa=Xbiw=1299bih=561q=san+francisco+public+library+larkin+street+hoursstick=H4sIAGOovnz8BQMDgzYHnxCXfq6-gVlZhbF5sZZ0drKVfk5-cmJJZn4enGGVkV9aVBzLKeznIsHxlTMy2S10V0iJwvZlMgBPWBDOSAei=qhlKVcKWJ8b7oQS65oCQCAved=0CJgBEOgTMBA: 


Open today · 9:00 am – 8:00 pm javascript:void(0)
 but I have no idea where that comes from.

kc

On 5/6/15 5:22 AM, Charlie Morris wrote:

I'm curious, Karen, Ethan or anyone else, do you know of any examples of
libraries that have implemented schema.org or RDFa for hours data and have
noticed that Google or some other search engine has picked it up (i.e.,
correctly displaying that data as part of the search results)?  And if so,
how quickly will Google or the like pickup on changes to hours (i.e.,
shifting between semesters or unplanned changes)?

On Wed, May 6, 2015 at 8:15 AM, Ethan Gruber ewg4x...@gmail.com wrote:


+1 on the RDFa and schema.org. For those that don't know the library URL
off-hand, it is much easier to find a library website by Googling than it
is to go through the central university portal, and the hours will show up
at the top of the page after having been harvested by search engines.

On Tue, May 5, 2015 at 6:54 PM, Karen Coyle li...@kcoyle.net wrote:


Note that library hours is one of the possible bits of information that
could be encoded as RDFa in the library web site, thus making it possible
to derive library hours directly from the listing of hours on the web

site

rather than keeping a separate list. Schema.org does have the elements

such

that hours can be encoded. This would mean that hours could show in the
display of the library's catalog entry on Google, Yahoo and Bing. Being
available directly through the search engines might be sufficient, not
necessitating creating yet-another-database for that data.

Schema.org uses a restaurant as its opening hours example, but much of

the

data would be the same for a library:

div vocab=http://schema.org/; typeof=Restaurant
   span property=nameGreatFood/span
   div property=aggregateRating  typeof=AggregateRating
 span property=ratingValue4/span stars -
 based on span property=reviewCount250/span reviews
   /div
   div property=address  typeof=PostalAddress
 span property=streetAddress1901 Lemur Ave/span
 span property=addressLocalitySunnyvale/span,
 span property=addressRegionCA/span span
property=postalCode94086/span
   /div
   span property=telephone(408) 714-1489/span
   a property=url href=http://www.dishdash.com;www.greatfood.com/a
   Hours:
   meta property=openingHours content=Mo-Sa 11:00-14:30Mon-Sat 11am

-

2:30pm
   meta property=openingHours content=Mo-Th 17:00-21:30Mon-Thu 5pm -
9:30pm
   meta property=openingHours content=Fr-Sa 17:00-22:00Fri-Sat 5pm -
10:00pm
   Categories:
   span property=servesCuisine
 Middle Eastern
   /span,
   span property=servesCuisine
 Mediterranean
   /span
   Price Range: span property=priceRange$$/span
   Takes Reservations: Yes
/div

It seems to me that using schema.org would get more bang for the buck --
it would get into the search engines and could also be aggregated into
whatever database is needed. As we've seen with OCLC, having a separate
listing is likely to mean that the data will be out of date.

kc

On 5/5/15 2:19 PM, nitin arora wrote:


I can't see they distinguished between public libraries and other types

on

their campaign page.

They say  all libraries as far as I can see.
So I suppose then that this is true for all libraries:
Libraries offer a space anyone can enter, where money isn't exchanged,
and
documentation doesn't have to be shown.
Who knew fines and library/student-IDs were a thing of the past?

The only data sets I can find where they got the 17,000 number is for
public libraries:
http://www.imls.gov/research/pls_data_files.aspx
Maybe I missed something.
There is an hours field on one of the CSVs I downloaded, etc for 2012

data

(the most recent I could find).

Asking 10k for something targeted for completion in June and without a
grasp on what types of libraries there are and how volatile the hours

Re: [CODE4LIB] Library Hours

2015-05-06 Thread Tajoli Zeno

Hi

Open today · 9:00 am – 8:00 pm javascript:void(0)
 but I have no idea where that comes from.


probably because the web page http://sfpl.org/index.php?pg=010101
insert library hours inside

div id=library-hours /div

Bye
Zeno Tajoli

--
Dr. Zeno Tajoli
Servizi Innovativi -- Automazione Biblioteche
z.taj...@cineca.it
fax +39 02 2135520
CINECA - Sede operativa di Segrate


Re: [CODE4LIB] Library Hours

2015-05-06 Thread Karen Coyle
The search engine may not pick it up quickly enough, but the emergency 
services in the area could get it from the RDFa as soon as it hits the web.


kc

On 5/6/15 6:45 AM, nitin arora wrote:

I think both creating a one-off list and schema.org approaches pose
problems within the context of the original fund raising campaign's pitch.
I don't think every library can necessarily implement the latter for a
variety of reasons, not always technical.

 From the pov that a library can be a community center in a time of crisis,
I'm wondering not only how quickly a search engine would pick that up but
also, in such moments, how prioritized updating that data would be in the
first place.

On Wed, May 6, 2015 at 8:22 AM, Charlie Morris cdmorri...@gmail.com wrote:


I'm curious, Karen, Ethan or anyone else, do you know of any examples of
libraries that have implemented schema.org or RDFa for hours data and have
noticed that Google or some other search engine has picked it up (i.e.,
correctly displaying that data as part of the search results)?  And if so,
how quickly will Google or the like pickup on changes to hours (i.e.,
shifting between semesters or unplanned changes)?

On Wed, May 6, 2015 at 8:15 AM, Ethan Gruber ewg4x...@gmail.com wrote:


+1 on the RDFa and schema.org. For those that don't know the library URL
off-hand, it is much easier to find a library website by Googling than it
is to go through the central university portal, and the hours will show

up

at the top of the page after having been harvested by search engines.

On Tue, May 5, 2015 at 6:54 PM, Karen Coyle li...@kcoyle.net wrote:


Note that library hours is one of the possible bits of information that
could be encoded as RDFa in the library web site, thus making it

possible

to derive library hours directly from the listing of hours on the web

site

rather than keeping a separate list. Schema.org does have the elements

such

that hours can be encoded. This would mean that hours could show in the
display of the library's catalog entry on Google, Yahoo and Bing. Being
available directly through the search engines might be sufficient, not
necessitating creating yet-another-database for that data.

Schema.org uses a restaurant as its opening hours example, but much of

the

data would be the same for a library:

div vocab=http://schema.org/; typeof=Restaurant
   span property=nameGreatFood/span
   div property=aggregateRating  typeof=AggregateRating
 span property=ratingValue4/span stars -
 based on span property=reviewCount250/span reviews
   /div
   div property=address  typeof=PostalAddress
 span property=streetAddress1901 Lemur Ave/span
 span property=addressLocalitySunnyvale/span,
 span property=addressRegionCA/span span
property=postalCode94086/span
   /div
   span property=telephone(408) 714-1489/span
   a property=url href=http://www.dishdash.com;www.greatfood.com

/a

   Hours:
   meta property=openingHours content=Mo-Sa 11:00-14:30Mon-Sat

11am

-

2:30pm
   meta property=openingHours content=Mo-Th 17:00-21:30Mon-Thu

5pm -

9:30pm
   meta property=openingHours content=Fr-Sa 17:00-22:00Fri-Sat

5pm -

10:00pm
   Categories:
   span property=servesCuisine
 Middle Eastern
   /span,
   span property=servesCuisine
 Mediterranean
   /span
   Price Range: span property=priceRange$$/span
   Takes Reservations: Yes
/div

It seems to me that using schema.org would get more bang for the buck

--

it would get into the search engines and could also be aggregated into
whatever database is needed. As we've seen with OCLC, having a separate
listing is likely to mean that the data will be out of date.

kc

On 5/5/15 2:19 PM, nitin arora wrote:


I can't see they distinguished between public libraries and other

types

on

their campaign page.

They say  all libraries as far as I can see.
So I suppose then that this is true for all libraries:
Libraries offer a space anyone can enter, where money isn't

exchanged,

and
documentation doesn't have to be shown.
Who knew fines and library/student-IDs were a thing of the past?

The only data sets I can find where they got the 17,000 number is for
public libraries:
http://www.imls.gov/research/pls_data_files.aspx
Maybe I missed something.
There is an hours field on one of the CSVs I downloaded, etc for 2012

data

(the most recent I could find).

Asking 10k for something targeted for completion in June and without a
grasp on what types of libraries there are and how volatile the hours
information is (especially in crisis) ...
Sounds naive at best, sketchy at worst.

The flexible funding button says this campaign will receive all

funds

raised even if it does not reach its goals.

The value of these places for youth cannot be underestimated.
So is the value of a quick buck ...

On Tue, May 5, 2015 at 4:53 PM, McCanna, Terran 
tmcca...@georgialibraries.org wrote:

  I'm not at all surprised that this doesn't already exist, and even if

OCLC's was available, I'd be willing to bet it 

Re: [CODE4LIB] Library Hours

2015-05-06 Thread nitin arora
I think both creating a one-off list and schema.org approaches pose
problems within the context of the original fund raising campaign's pitch.
I don't think every library can necessarily implement the latter for a
variety of reasons, not always technical.

From the pov that a library can be a community center in a time of crisis,
I'm wondering not only how quickly a search engine would pick that up but
also, in such moments, how prioritized updating that data would be in the
first place.

On Wed, May 6, 2015 at 8:22 AM, Charlie Morris cdmorri...@gmail.com wrote:

 I'm curious, Karen, Ethan or anyone else, do you know of any examples of
 libraries that have implemented schema.org or RDFa for hours data and have
 noticed that Google or some other search engine has picked it up (i.e.,
 correctly displaying that data as part of the search results)?  And if so,
 how quickly will Google or the like pickup on changes to hours (i.e.,
 shifting between semesters or unplanned changes)?

 On Wed, May 6, 2015 at 8:15 AM, Ethan Gruber ewg4x...@gmail.com wrote:

  +1 on the RDFa and schema.org. For those that don't know the library URL
  off-hand, it is much easier to find a library website by Googling than it
  is to go through the central university portal, and the hours will show
 up
  at the top of the page after having been harvested by search engines.
 
  On Tue, May 5, 2015 at 6:54 PM, Karen Coyle li...@kcoyle.net wrote:
 
   Note that library hours is one of the possible bits of information that
   could be encoded as RDFa in the library web site, thus making it
 possible
   to derive library hours directly from the listing of hours on the web
  site
   rather than keeping a separate list. Schema.org does have the elements
  such
   that hours can be encoded. This would mean that hours could show in the
   display of the library's catalog entry on Google, Yahoo and Bing. Being
   available directly through the search engines might be sufficient, not
   necessitating creating yet-another-database for that data.
  
   Schema.org uses a restaurant as its opening hours example, but much of
  the
   data would be the same for a library:
  
   div vocab=http://schema.org/; typeof=Restaurant
 span property=nameGreatFood/span
 div property=aggregateRating  typeof=AggregateRating
   span property=ratingValue4/span stars -
   based on span property=reviewCount250/span reviews
 /div
 div property=address  typeof=PostalAddress
   span property=streetAddress1901 Lemur Ave/span
   span property=addressLocalitySunnyvale/span,
   span property=addressRegionCA/span span
   property=postalCode94086/span
 /div
 span property=telephone(408) 714-1489/span
 a property=url href=http://www.dishdash.com;www.greatfood.com
 /a
 Hours:
 meta property=openingHours content=Mo-Sa 11:00-14:30Mon-Sat
 11am
  -
   2:30pm
 meta property=openingHours content=Mo-Th 17:00-21:30Mon-Thu
 5pm -
   9:30pm
 meta property=openingHours content=Fr-Sa 17:00-22:00Fri-Sat
 5pm -
   10:00pm
 Categories:
 span property=servesCuisine
   Middle Eastern
 /span,
 span property=servesCuisine
   Mediterranean
 /span
 Price Range: span property=priceRange$$/span
 Takes Reservations: Yes
   /div
  
   It seems to me that using schema.org would get more bang for the buck
 --
   it would get into the search engines and could also be aggregated into
   whatever database is needed. As we've seen with OCLC, having a separate
   listing is likely to mean that the data will be out of date.
  
   kc
  
   On 5/5/15 2:19 PM, nitin arora wrote:
  
   I can't see they distinguished between public libraries and other
 types
  on
   their campaign page.
  
   They say  all libraries as far as I can see.
   So I suppose then that this is true for all libraries:
   Libraries offer a space anyone can enter, where money isn't
 exchanged,
   and
   documentation doesn't have to be shown.
   Who knew fines and library/student-IDs were a thing of the past?
  
   The only data sets I can find where they got the 17,000 number is for
   public libraries:
   http://www.imls.gov/research/pls_data_files.aspx
   Maybe I missed something.
   There is an hours field on one of the CSVs I downloaded, etc for 2012
  data
   (the most recent I could find).
  
   Asking 10k for something targeted for completion in June and without a
   grasp on what types of libraries there are and how volatile the hours
   information is (especially in crisis) ...
   Sounds naive at best, sketchy at worst.
  
   The flexible funding button says this campaign will receive all
 funds
   raised even if it does not reach its goals.
  
   The value of these places for youth cannot be underestimated.
   So is the value of a quick buck ...
  
   On Tue, May 5, 2015 at 4:53 PM, McCanna, Terran 
   tmcca...@georgialibraries.org wrote:
  
I'm not at all surprised that this doesn't already exist, and even if
   OCLC's was 

Re: [CODE4LIB] Library Hours

2015-05-06 Thread Tom Keays
I'd like to find out how and why Google is parsing this information. If you
go to the the SFPL hours page (first link in the Google results), and look
at the source code, this is all you find.
http://sfpl.org/index.php?pg=010101
Is the ID in the DIV sufficient?  It would be nice to have a set of use
cases to work from.

Currently, I'm generating a weekly hours box by pulling JSONP from the
hours API of LibCal. I could easily output this in schema.org format (and
probably will now), but can Google pick up the information from the DOM if
it is delivered as JSON and transformed into HTML?

div id=library-hours
  h2Hours/h2
  table class=hours cellpadding=0 cellspacing=0
tr
  thSun/th
  thMon/th
  thTue/th
  th class=todayWed/th
  thThu/th
  thFri/th
  thSat/th
/tr
tr
  td12-5/td
  td10-6/td
  td9-8/td
  td class=today9-8/td
  td9-8/td
  td12-6/td
  td10-6/td
/tr
  /table
/div


On Wed, May 6, 2015 at 9:47 AM, Karen Coyle li...@kcoyle.net wrote:

 Charlie, I don't know of any libraries that have used schema.org for
 their web site - perhaps others do. If it is used, it should be picked up
 the next time the search engines index the site. What the search engines do
 with schema.org is not guaranteed, but can be observed. It is not
 guaranteed because none of the search engines will say what they do, as
 that is considered a trade secret (especially from each other).

 However, as locations and hours are important for their commercial
 customers (stores, restaurants, etc.) I would expect that to be picked up
 as a matter of course. Note that already locations and hours for some
 businesses do show in the search engines, and that is for sites that are
 not yet using schema.org, so the engines have some way of picking that up
 from the HTML. The Google side-bar knowledge graph for my local libraries
 shows  Hours 
 https://www.google.com/search?sa=Xbiw=1299bih=561q=san+francisco+public+library+larkin+street+hoursstick=H4sIAGOovnz8BQMDgzYHnxCXfq6-gVlZhbF5sZZ0drKVfk5-cmJJZn4enGGVkV9aVBzLKeznIsHxlTMy2S10V0iJwvZlMgBPWBDOSAei=qhlKVcKWJ8b7oQS65oCQCAved=0CJgBEOgTMBA:

 Open today · 9:00 am – 8:00 pm javascript:void(0)
  but I have no idea where that comes from.

 kc


 On 5/6/15 5:22 AM, Charlie Morris wrote:

 I'm curious, Karen, Ethan or anyone else, do you know of any examples of
 libraries that have implemented schema.org or RDFa for hours data and
 have
 noticed that Google or some other search engine has picked it up (i.e.,
 correctly displaying that data as part of the search results)?  And if so,
 how quickly will Google or the like pickup on changes to hours (i.e.,
 shifting between semesters or unplanned changes)?

 On Wed, May 6, 2015 at 8:15 AM, Ethan Gruber ewg4x...@gmail.com wrote:

  +1 on the RDFa and schema.org. For those that don't know the library URL
 off-hand, it is much easier to find a library website by Googling than it
 is to go through the central university portal, and the hours will show
 up
 at the top of the page after having been harvested by search engines.

 On Tue, May 5, 2015 at 6:54 PM, Karen Coyle li...@kcoyle.net wrote:

  Note that library hours is one of the possible bits of information that
 could be encoded as RDFa in the library web site, thus making it
 possible
 to derive library hours directly from the listing of hours on the web

 site

 rather than keeping a separate list. Schema.org does have the elements

 such

 that hours can be encoded. This would mean that hours could show in the
 display of the library's catalog entry on Google, Yahoo and Bing. Being
 available directly through the search engines might be sufficient, not
 necessitating creating yet-another-database for that data.

 Schema.org uses a restaurant as its opening hours example, but much of

 the

 data would be the same for a library:

 div vocab=http://schema.org/; typeof=Restaurant
span property=nameGreatFood/span
div property=aggregateRating  typeof=AggregateRating
  span property=ratingValue4/span stars -
  based on span property=reviewCount250/span reviews
/div
div property=address  typeof=PostalAddress
  span property=streetAddress1901 Lemur Ave/span
  span property=addressLocalitySunnyvale/span,
  span property=addressRegionCA/span span
 property=postalCode94086/span
/div
span property=telephone(408) 714-1489/span
a property=url href=http://www.dishdash.com;www.greatfood.com
 /a
Hours:
meta property=openingHours content=Mo-Sa 11:00-14:30Mon-Sat
 11am

 -

 2:30pm
meta property=openingHours content=Mo-Th 17:00-21:30Mon-Thu
 5pm -
 9:30pm
meta property=openingHours content=Fr-Sa 17:00-22:00Fri-Sat
 5pm -
 10:00pm
Categories:
span property=servesCuisine
  Middle Eastern
/span,
span property=servesCuisine
  Mediterranean
/span
Price Range: span property=priceRange$$/span
Takes Reservations: Yes
 /div

 It seems to me 

Re: [CODE4LIB] Library Hours

2015-05-06 Thread Karen Coyle
Right, but I don't think that meets any particular standard, which means 
that Google is doing a lot of text analysis when it indexes pages, 
looking for a pattern that looks like opening hours. That takes more 
cycles than having it all neatly wrapped in some known RDFa.


kc

On 5/6/15 6:54 AM, Tajoli Zeno wrote:

Hi

Open today · 9:00 am – 8:00 pm javascript:void(0)
 but I have no idea where that comes from.


probably because the web page http://sfpl.org/index.php?pg=010101
insert library hours inside

div id=library-hours /div

Bye
Zeno Tajoli



--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
m: +1-510-435-8234
skype: kcoylenet/+1-510-984-3600


[CODE4LIB] Learn to Teach Coding - [free] webinar and ALA pre conference

2015-05-06 Thread Goben, Abigail
Please note the webinar will be free to the first 100 log ins. If you're 
interested in teaching code/mentoring in technology, this may be of 
interest!

Cheers

-- Forwarded message --
From: *Mark Beatty* mbea...@ala.org mailto:mbea...@ala.org
Date: Tue, May 5, 2015 at 2:43 PM
Subject: [lita-l] Learn to Teach Coding - webinar and ALA pre conference
To: lit...@lists.ala.org mailto:lit...@lists.ala.org 
lit...@lists.ala.org mailto:lit...@lists.ala.org



Learn to Teach Coding and Mentor Technology Newbies – in Your Library 
or Anywhere!


Attend a free one hour webinar 
http://ala.adobeconnect.com/teachcoding/to discover what learning to 
teach coding is all about, and then register 
http://alaac15.ala.org/register-now for and attend the LITA 
preconference at ALA Annual 
http://www.ala.org/lita/conferences/annual/2015. This opportunity is 
following up on the 2014 LITA President’s Program at ALA Annual where 
then LITA President Cindi Trainor Blyberg welcomed Kimberly Bryant, 
founder of Black Girls Code.


The informational webinar is free and open to the first 100 log-ins:
Tuesday May 26, 2015 at 1:00 pm Central Time
http://ala.adobeconnect.com/teachcoding/ 
http://www.ala.org/lita/conferences/annual/2015
Enter as guest. The webinar will be recorded and the link to the 
recording will be posted to these same resource spaces.


Register online for the ALA Annual Conference and add a LITA 
Preconference http://alaac15.ala.org/register-now


Black Girls CODE (BGC) http://www.blackgirlscode.com/ is devoted to 
showing the world that black girls can code, and grow the number of 
women of color working in technology. LITA is devoted to putting on 
programs that promote, develop, and aid in the implementation of library 
and information technology. Together, BCG and LITA offer this full day 
pre-conference workshop, designed to turn reasonably tech savvy 
librarians into master technology teachers. The workshop will help 
attendees develop effective lesson plans and design projects their 
students can complete successfully in their own coding workshops. The 
schedule will feature presentations in the morning followed by 
afternoon breakout workgroups, in which attendees can experiment with 
programming languages such as Scratch, Ruby on Rails, and more.


Presenters:

Kimberly Bryant, Founder and Executive Director Black Girls CODE 
http://www.blackgirlscode.com/about-bgc.html


Lake Raymond, Program Coordinator Black Girls CODE

Mikala Streeter, Curriculum Consultant Black Girls CODE


The Black Girl Code Vision: To increase the number of women of color in 
the digital space by empowering girls of color ages 7 to 17 to 
become innovators in STEM fields, leaders in their communities, and 
builders of their own futures through exposure to computer science and 
technology.


Kimberly Bryant:
That, really, is the Black Girls Code mission: to introduce programming 
and technology to a new generation of coders, coders who will become 
builders of technological innovation and of their own futures. Imagine 
the impact that these curious, creative minds could have on the world 
with the guidance and encouragement others take for granted.


REGISTRATION:

Cost

• LITA Member $235 (coupon code: LITA2015)
• ALA Member $350
• Non-Member $380
How-to

To register for any of these events, you can include them with your 
initial conference registration or add them later using the unique link 
in your email confirmation. If you don’t have your registration 
confirmation handy, you can request a copy by emailing 
alaann...@compusystems.com mailto:alaann...@compusystems.com. You also 
have the option of registering for a preconference only. To receive the 
LITA member pricing during the registration process on the Personal 
Information page enter the discount promotional code: LITA2015


Register online for the ALA Annual Conference and add a LITA 
Preconference http://alaac15.ala.org/register-now

Call ALA Registration at 1-800-974-3084 tel:1-800-974-3084
Onsite registration will also be accepted in San Francisco.

Questions or Comments?

For all other questions or comments related to the course, contact LITA 
at (312) 280-4269 tel:%28312%29%20280-4269 or Mark Beatty, 
mbea...@ala.org mailto:mbea...@ala.org


_/_/_/_/_/

Mark Beatty
Programs and Marketing Specialist
ALA/LITA
50 East Huron
Chicago, IL 60611
312.280.4268 tel:312.280.4268
mbea...@ala.org mailto:mbea...@ala.org
www.lita.org http://www.lita.org




--
Abigail Goben, MLS
abigailgo...@gmail.com mailto:abigailgo...@gmail.com
http://HedgehogLibrarian.com


Re: [CODE4LIB] Library Hours

2015-05-06 Thread Richard Wallis
I believe the objective, of the search engines, is to be able to provide
user useful functionality in both their Knowledge Graphs and on mobile
devices for all local businesses.  I note now when I search for the local
branch of Best Buy or similar on my iPhone I get the 'Open Now' or 'Closed
Now' message as part of the result.

Karen is right about anyone, including emergency services, being able to
harvest this data from your site - mage easier by using a consistent format
such as Schema.org.

As an aside, the Schema.org community is currently discussing the
formatting of opening hours, and consistence with other similar event based
timings.  It looks like they are going to keep it simple for the moment,
returning to issues such as exceptions like open every day 9-5 except
Wednesdays in January of a leap year in the near future.

Richard.

On 6 May 2015 at 15:02, Karen Coyle li...@kcoyle.net wrote:

 The search engine may not pick it up quickly enough, but the emergency
 services in the area could get it from the RDFa as soon as it hits the web.

 kc


 On 5/6/15 6:45 AM, nitin arora wrote:

 I think both creating a one-off list and schema.org approaches pose
 problems within the context of the original fund raising campaign's pitch.
 I don't think every library can necessarily implement the latter for a
 variety of reasons, not always technical.

  From the pov that a library can be a community center in a time of
 crisis,
 I'm wondering not only how quickly a search engine would pick that up but
 also, in such moments, how prioritized updating that data would be in the
 first place.

 On Wed, May 6, 2015 at 8:22 AM, Charlie Morris cdmorri...@gmail.com
 wrote:

  I'm curious, Karen, Ethan or anyone else, do you know of any examples of
 libraries that have implemented schema.org or RDFa for hours data and
 have
 noticed that Google or some other search engine has picked it up (i.e.,
 correctly displaying that data as part of the search results)?  And if
 so,
 how quickly will Google or the like pickup on changes to hours (i.e.,
 shifting between semesters or unplanned changes)?

 On Wed, May 6, 2015 at 8:15 AM, Ethan Gruber ewg4x...@gmail.com wrote:

  +1 on the RDFa and schema.org. For those that don't know the library
 URL
 off-hand, it is much easier to find a library website by Googling than
 it
 is to go through the central university portal, and the hours will show

 up

 at the top of the page after having been harvested by search engines.

 On Tue, May 5, 2015 at 6:54 PM, Karen Coyle li...@kcoyle.net wrote:

  Note that library hours is one of the possible bits of information that
 could be encoded as RDFa in the library web site, thus making it

 possible

 to derive library hours directly from the listing of hours on the web

 site

 rather than keeping a separate list. Schema.org does have the elements

 such

 that hours can be encoded. This would mean that hours could show in the
 display of the library's catalog entry on Google, Yahoo and Bing. Being
 available directly through the search engines might be sufficient, not
 necessitating creating yet-another-database for that data.

 Schema.org uses a restaurant as its opening hours example, but much of

 the

 data would be the same for a library:

 div vocab=http://schema.org/; typeof=Restaurant
span property=nameGreatFood/span
div property=aggregateRating  typeof=AggregateRating
  span property=ratingValue4/span stars -
  based on span property=reviewCount250/span reviews
/div
div property=address  typeof=PostalAddress
  span property=streetAddress1901 Lemur Ave/span
  span property=addressLocalitySunnyvale/span,
  span property=addressRegionCA/span span
 property=postalCode94086/span
/div
span property=telephone(408) 714-1489/span
a property=url href=http://www.dishdash.com;www.greatfood.com

 /a

Hours:
meta property=openingHours content=Mo-Sa 11:00-14:30Mon-Sat

 11am

 -

 2:30pm
meta property=openingHours content=Mo-Th 17:00-21:30Mon-Thu

 5pm -

 9:30pm
meta property=openingHours content=Fr-Sa 17:00-22:00Fri-Sat

 5pm -

 10:00pm
Categories:
span property=servesCuisine
  Middle Eastern
/span,
span property=servesCuisine
  Mediterranean
/span
Price Range: span property=priceRange$$/span
Takes Reservations: Yes
 /div

 It seems to me that using schema.org would get more bang for the buck

 --

 it would get into the search engines and could also be aggregated into
 whatever database is needed. As we've seen with OCLC, having a separate
 listing is likely to mean that the data will be out of date.

 kc

 On 5/5/15 2:19 PM, nitin arora wrote:

  I can't see they distinguished between public libraries and other

 types

 on

 their campaign page.

 They say  all libraries as far as I can see.
 So I suppose then that this is true for all libraries:
 Libraries offer a space anyone can enter, where money isn't

 exchanged,

 and
 documentation 

Re: [CODE4LIB] Library Hours

2015-05-06 Thread Megan O'Neill Kudzia
Hi all,

I've been experimenting with schema.org OpeningHoursSpecification, and
currently Bing is scraping our hours, but Google isn't. I am using
RDFa-lite and I've validated it using a linter (thanks Jason Ronallo!), so
I'm scratching my head as to why our hours *still* don't show up on a
google search.

I suspect part of it for us might be that we're re-branding away from
Stockwell-Mudd Libraries to Albion College Library, as it's much more
explanatory, but neither search through Google yields a nice box with hours
in it like the SFPL.

If and when I figure out the problem I'd be happy to send you an update of
what we did and what caused it to finally work properly.

On Wed, May 6, 2015 at 10:21 AM, Karen Coyle li...@kcoyle.net wrote:

 Tom, Google will not tell you. The entirety of how Google search works is
 a trade secret. We don't know the algorithm for ranking, and we don't know
 what information they glean from web pages -- and they are unlikely to
 tell. It is a constant on the schema.org discussion list that developers
 want to know what Google/Bing/Yahoo/Yandex will do with specific
 information in the web pages, and it is a constant that the reps there
 reply: we cannot tell you that. The only way to find out is to code and
 observe.

 kc


 On 5/6/15 7:00 AM, Tom Keays wrote:

 I'd like to find out how and why Google is parsing this information. If
 you
 go to the the SFPL hours page (first link in the Google results), and look
 at the source code, this is all you find.
 http://sfpl.org/index.php?pg=010101
 Is the ID in the DIV sufficient?  It would be nice to have a set of use
 cases to work from.

 Currently, I'm generating a weekly hours box by pulling JSONP from the
 hours API of LibCal. I could easily output this in schema.org format (and
 probably will now), but can Google pick up the information from the DOM if
 it is delivered as JSON and transformed into HTML?

 div id=library-hours
h2Hours/h2
table class=hours cellpadding=0 cellspacing=0
  tr
thSun/th
thMon/th
thTue/th
th class=todayWed/th
thThu/th
thFri/th
thSat/th
  /tr
  tr
td12-5/td
td10-6/td
td9-8/td
td class=today9-8/td
td9-8/td
td12-6/td
td10-6/td
  /tr
/table
 /div


 On Wed, May 6, 2015 at 9:47 AM, Karen Coyle li...@kcoyle.net wrote:

  Charlie, I don't know of any libraries that have used schema.org for
 their web site - perhaps others do. If it is used, it should be picked
 up
 the next time the search engines index the site. What the search engines
 do
 with schema.org is not guaranteed, but can be observed. It is not
 guaranteed because none of the search engines will say what they do, as
 that is considered a trade secret (especially from each other).

 However, as locations and hours are important for their commercial
 customers (stores, restaurants, etc.) I would expect that to be picked up
 as a matter of course. Note that already locations and hours for some
 businesses do show in the search engines, and that is for sites that are
 not yet using schema.org, so the engines have some way of picking that
 up
 from the HTML. The Google side-bar knowledge graph for my local
 libraries
 shows  Hours 

 https://www.google.com/search?sa=Xbiw=1299bih=561q=san+francisco+public+library+larkin+street+hoursstick=H4sIAGOovnz8BQMDgzYHnxCXfq6-gVlZhbF5sZZ0drKVfk5-cmJJZn4enGGVkV9aVBzLKeznIsHxlTMy2S10V0iJwvZlMgBPWBDOSAei=qhlKVcKWJ8b7oQS65oCQCAved=0CJgBEOgTMBA
 :

 Open today · 9:00 am – 8:00 pm javascript:void(0)
  but I have no idea where that comes from.

 kc


 On 5/6/15 5:22 AM, Charlie Morris wrote:

  I'm curious, Karen, Ethan or anyone else, do you know of any examples of
 libraries that have implemented schema.org or RDFa for hours data and
 have
 noticed that Google or some other search engine has picked it up (i.e.,
 correctly displaying that data as part of the search results)?  And if
 so,
 how quickly will Google or the like pickup on changes to hours (i.e.,
 shifting between semesters or unplanned changes)?

 On Wed, May 6, 2015 at 8:15 AM, Ethan Gruber ewg4x...@gmail.com
 wrote:

   +1 on the RDFa and schema.org. For those that don't know the library
 URL

 off-hand, it is much easier to find a library website by Googling than
 it
 is to go through the central university portal, and the hours will show
 up
 at the top of the page after having been harvested by search engines.

 On Tue, May 5, 2015 at 6:54 PM, Karen Coyle li...@kcoyle.net wrote:

   Note that library hours is one of the possible bits of information
 that

 could be encoded as RDFa in the library web site, thus making it
 possible
 to derive library hours directly from the listing of hours on the web

  site

  rather than keeping a separate list. Schema.org does have the elements

  such

  that hours can be encoded. This would mean that hours could show in
 the
 display of the library's catalog 

Re: [CODE4LIB] Help with Auto Hot Key

2015-05-06 Thread Andrew Weidner
Hi Eddie,

AutoHotkey can probably do what you want to do. I am not familiar with the
Sierra interface, although I have successfully used AHK to automate
workflows in a variety of applications.

Here's an example of a subroutine with key commands that copy the contents
of a CONTENTdm text input box:
https://github.com/metaweidner/UHDL_SubjectTopical_CDM/blob/master/UHDL_SubjectTopical_CDM.ahk#L295-303

And check to see if there is was actually any text on the clipboard as a
result:
https://github.com/metaweidner/UHDL_SubjectTopical_CDM/blob/master/UHDL_SubjectTopical_CDM.ahk#L152-158

I'd be happy to pass along more examples.

Best,

Andrew Weidner
ajweid...@uh.edu



On Tue, May 5, 2015 at 5:00 PM, Karl Holten khol...@switchinc.org wrote:

 This doesn't involve AutoHotkey, but maybe it would be easier to use SQL
 to pull that data from the Sierra database rather than screen scraping from
 the Sierra application. You wouldn't need to worry about where stuff
 displays in the interface, just where its stored on the backend. This
 solution would probably be cleaner to maintain as well.

 Excel has ways to pull in data from external sources like SQL databases,
 it looks like Microsoft Publisher does too. I can't speak to how easy it
 would be to set that up, but hopefully it would give you a start:

 https://support.office.com/en-ie/article/Import-data-into-Office-Publisher--Visio-or-Word-by-using-the-Data-Connection-Wizard-65295a62-8da3-49bc-8dd8-1f77d0a05127

 Anyway, that's my 2 cents on an alternative tack you might want to try.

 Hope that helps,
 Karl Holten
 Systems Integration Specialist
 SWITCH Inc
 414-382-6711

 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
 Eddie Clem
 Sent: Monday, May 4, 2015 1:50 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: [CODE4LIB] Help with Auto Hot Key

 Hi there! I'm hoping someone here is a guru at AutoHotKey! :)



 We have a clerk that pays our invoices in Sierra. She will write the bib
 number on a sticky note, as well as the list price and the locations (that
 each copy will go to). I want to have Sierra copy the bib number, list
 price, locations, and order record notes onto a receipt and then this clerk
 would put this receipt with the first copy of the material, rather than
 hand write on sticky notes all day! Since I had looked, and couldn't find a
 way to do this easily from Sierra, I had another brilliant idea that we
 could have Autohotkey copy the fields I want into a template (say, in
 Publisher) and have the bib number turned into a barcode, and list the
 other fields that we want that travel around the tech services department.
 This barcoded bib number would be used by catalogers to enter the bib
 number in the 949 for overlay in Connexion, and then again by our barcoding
 clerk to search by bib number in Sierra. At this point, I'm thinking that
 Autohotkey is my best bet.



 Here is my prototype of what the routing slip would look like when it's
 done. The Thickety 2 is a note in the order record put in by our
 selectors for our catalogers to add that series to the bib record. The
 978... is just a placeholder for where the list price will go once we get
 that field added to our order records:



 [cid:image001.png@01D08679.A5CC5160]



 Here is the corresponding order record. Part of my problem for Autohotkey
 is that not all order records will contain a note (in field z) and the
 locations may be different (fewer or more) on the LOCATIONS line. I have to
 include the multi line, because if it's just our Main Library that's
 receiving the item, then the LOCATIONS at the bottom don't show up at
 all...just the LOCATION fixed field (under ACQ TYPE).



 [cid:image002.png@01D08679.A5CC5160]



 Any thoughts would be greatly appreciated!



 Thanks!

 Eddie


 Eddie Clem, MLS
 Cataloging Librarian
 ec...@khcpl.orgmailto:ec...@khcpl.org | www.KHCPL.org
 http://www.khcpl.org/

 Kokomo-Howard County Public Library
 Collection Management Department
 305 East Mulberry Street
 Kokomo, IN 46901
 765.626.0853|765.450.6290 (fax)



Re: [CODE4LIB] Library Hours

2015-05-06 Thread Jason Bengtson
When I was at the Robert M Bird Library I put some basic schema.org on the
old site, but I didn't mark up the hours. That'll be a project for here as
well, once I get out from under some of what I'm working on now.

Best regards,
*Jason Bengtson, MLIS, MA*
Innovation Architect


*Houston Academy of MedicineThe Texas Medical Center Library*
1133 John Freeman Blvd
Houston, TX   77030
http://library.tmc.edu/
www.jasonbengtson.com

On Wed, May 6, 2015 at 9:02 AM, Karen Coyle li...@kcoyle.net wrote:

 The search engine may not pick it up quickly enough, but the emergency
 services in the area could get it from the RDFa as soon as it hits the web.

 kc


 On 5/6/15 6:45 AM, nitin arora wrote:

 I think both creating a one-off list and schema.org approaches pose
 problems within the context of the original fund raising campaign's pitch.
 I don't think every library can necessarily implement the latter for a
 variety of reasons, not always technical.

  From the pov that a library can be a community center in a time of
 crisis,
 I'm wondering not only how quickly a search engine would pick that up but
 also, in such moments, how prioritized updating that data would be in the
 first place.

 On Wed, May 6, 2015 at 8:22 AM, Charlie Morris cdmorri...@gmail.com
 wrote:

  I'm curious, Karen, Ethan or anyone else, do you know of any examples of
 libraries that have implemented schema.org or RDFa for hours data and
 have
 noticed that Google or some other search engine has picked it up (i.e.,
 correctly displaying that data as part of the search results)?  And if
 so,
 how quickly will Google or the like pickup on changes to hours (i.e.,
 shifting between semesters or unplanned changes)?

 On Wed, May 6, 2015 at 8:15 AM, Ethan Gruber ewg4x...@gmail.com wrote:

  +1 on the RDFa and schema.org. For those that don't know the library
 URL
 off-hand, it is much easier to find a library website by Googling than
 it
 is to go through the central university portal, and the hours will show

 up

 at the top of the page after having been harvested by search engines.

 On Tue, May 5, 2015 at 6:54 PM, Karen Coyle li...@kcoyle.net wrote:

  Note that library hours is one of the possible bits of information that
 could be encoded as RDFa in the library web site, thus making it

 possible

 to derive library hours directly from the listing of hours on the web

 site

 rather than keeping a separate list. Schema.org does have the elements

 such

 that hours can be encoded. This would mean that hours could show in the
 display of the library's catalog entry on Google, Yahoo and Bing. Being
 available directly through the search engines might be sufficient, not
 necessitating creating yet-another-database for that data.

 Schema.org uses a restaurant as its opening hours example, but much of

 the

 data would be the same for a library:

 div vocab=http://schema.org/; typeof=Restaurant
span property=nameGreatFood/span
div property=aggregateRating  typeof=AggregateRating
  span property=ratingValue4/span stars -
  based on span property=reviewCount250/span reviews
/div
div property=address  typeof=PostalAddress
  span property=streetAddress1901 Lemur Ave/span
  span property=addressLocalitySunnyvale/span,
  span property=addressRegionCA/span span
 property=postalCode94086/span
/div
span property=telephone(408) 714-1489/span
a property=url href=http://www.dishdash.com;www.greatfood.com

 /a

Hours:
meta property=openingHours content=Mo-Sa 11:00-14:30Mon-Sat

 11am

 -

 2:30pm
meta property=openingHours content=Mo-Th 17:00-21:30Mon-Thu

 5pm -

 9:30pm
meta property=openingHours content=Fr-Sa 17:00-22:00Fri-Sat

 5pm -

 10:00pm
Categories:
span property=servesCuisine
  Middle Eastern
/span,
span property=servesCuisine
  Mediterranean
/span
Price Range: span property=priceRange$$/span
Takes Reservations: Yes
 /div

 It seems to me that using schema.org would get more bang for the buck

 --

 it would get into the search engines and could also be aggregated into
 whatever database is needed. As we've seen with OCLC, having a separate
 listing is likely to mean that the data will be out of date.

 kc

 On 5/5/15 2:19 PM, nitin arora wrote:

  I can't see they distinguished between public libraries and other

 types

 on

 their campaign page.

 They say  all libraries as far as I can see.
 So I suppose then that this is true for all libraries:
 Libraries offer a space anyone can enter, where money isn't

 exchanged,

 and
 documentation doesn't have to be shown.
 Who knew fines and library/student-IDs were a thing of the past?

 The only data sets I can find where they got the 17,000 number is for
 public libraries:
 http://www.imls.gov/research/pls_data_files.aspx
 Maybe I missed something.
 There is an hours field on one of the CSVs I downloaded, etc for 2012

 data

 (the most recent I could find).

 Asking 10k for something targeted for 

Re: [CODE4LIB] Library Hours

2015-05-06 Thread Karen Coyle
Tom, Google will not tell you. The entirety of how Google search works 
is a trade secret. We don't know the algorithm for ranking, and we don't 
know what information they glean from web pages -- and they are unlikely 
to tell. It is a constant on the schema.org discussion list that 
developers want to know what Google/Bing/Yahoo/Yandex will do with 
specific information in the web pages, and it is a constant that the 
reps there reply: we cannot tell you that. The only way to find out is 
to code and observe.


kc

On 5/6/15 7:00 AM, Tom Keays wrote:

I'd like to find out how and why Google is parsing this information. If you
go to the the SFPL hours page (first link in the Google results), and look
at the source code, this is all you find.
http://sfpl.org/index.php?pg=010101
Is the ID in the DIV sufficient?  It would be nice to have a set of use
cases to work from.

Currently, I'm generating a weekly hours box by pulling JSONP from the
hours API of LibCal. I could easily output this in schema.org format (and
probably will now), but can Google pick up the information from the DOM if
it is delivered as JSON and transformed into HTML?

div id=library-hours
   h2Hours/h2
   table class=hours cellpadding=0 cellspacing=0
 tr
   thSun/th
   thMon/th
   thTue/th
   th class=todayWed/th
   thThu/th
   thFri/th
   thSat/th
 /tr
 tr
   td12-5/td
   td10-6/td
   td9-8/td
   td class=today9-8/td
   td9-8/td
   td12-6/td
   td10-6/td
 /tr
   /table
/div


On Wed, May 6, 2015 at 9:47 AM, Karen Coyle li...@kcoyle.net wrote:


Charlie, I don't know of any libraries that have used schema.org for
their web site - perhaps others do. If it is used, it should be picked up
the next time the search engines index the site. What the search engines do
with schema.org is not guaranteed, but can be observed. It is not
guaranteed because none of the search engines will say what they do, as
that is considered a trade secret (especially from each other).

However, as locations and hours are important for their commercial
customers (stores, restaurants, etc.) I would expect that to be picked up
as a matter of course. Note that already locations and hours for some
businesses do show in the search engines, and that is for sites that are
not yet using schema.org, so the engines have some way of picking that up
from the HTML. The Google side-bar knowledge graph for my local libraries
shows  Hours 
https://www.google.com/search?sa=Xbiw=1299bih=561q=san+francisco+public+library+larkin+street+hoursstick=H4sIAGOovnz8BQMDgzYHnxCXfq6-gVlZhbF5sZZ0drKVfk5-cmJJZn4enGGVkV9aVBzLKeznIsHxlTMy2S10V0iJwvZlMgBPWBDOSAei=qhlKVcKWJ8b7oQS65oCQCAved=0CJgBEOgTMBA:

Open today · 9:00 am – 8:00 pm javascript:void(0)
 but I have no idea where that comes from.

kc


On 5/6/15 5:22 AM, Charlie Morris wrote:


I'm curious, Karen, Ethan or anyone else, do you know of any examples of
libraries that have implemented schema.org or RDFa for hours data and
have
noticed that Google or some other search engine has picked it up (i.e.,
correctly displaying that data as part of the search results)?  And if so,
how quickly will Google or the like pickup on changes to hours (i.e.,
shifting between semesters or unplanned changes)?

On Wed, May 6, 2015 at 8:15 AM, Ethan Gruber ewg4x...@gmail.com wrote:

  +1 on the RDFa and schema.org. For those that don't know the library URL

off-hand, it is much easier to find a library website by Googling than it
is to go through the central university portal, and the hours will show
up
at the top of the page after having been harvested by search engines.

On Tue, May 5, 2015 at 6:54 PM, Karen Coyle li...@kcoyle.net wrote:

  Note that library hours is one of the possible bits of information that

could be encoded as RDFa in the library web site, thus making it
possible
to derive library hours directly from the listing of hours on the web


site


rather than keeping a separate list. Schema.org does have the elements


such


that hours can be encoded. This would mean that hours could show in the
display of the library's catalog entry on Google, Yahoo and Bing. Being
available directly through the search engines might be sufficient, not
necessitating creating yet-another-database for that data.

Schema.org uses a restaurant as its opening hours example, but much of


the


data would be the same for a library:

div vocab=http://schema.org/; typeof=Restaurant
span property=nameGreatFood/span
div property=aggregateRating  typeof=AggregateRating
  span property=ratingValue4/span stars -
  based on span property=reviewCount250/span reviews
/div
div property=address  typeof=PostalAddress
  span property=streetAddress1901 Lemur Ave/span
  span property=addressLocalitySunnyvale/span,
  span property=addressRegionCA/span span
property=postalCode94086/span
/div
span property=telephone(408) 714-1489/span
a property=url 

[CODE4LIB] DLF Preconference for Liberal Arts Colleges - Call For Proposals

2015-05-06 Thread Kelcy Shepherd
The Digital Library Federation is hosting our inaugural DLF Liberal Arts 
Colleges Preconference on October 25th in Vancouver, BC, preceding this year's 
DLF Forum.

The one-day preconference will be an opportunity for those working with digital 
libraries/digital scholarship in liberal arts colleges to work closely 
together, in the spirit of the liberal arts seminar, to consider the issues and 
opportunities unique to us. We invite proposals for panels, presentations, or 
working sessions that foster conversation, connections, and provocation at the 
intersection of digital libraries and the liberal arts. How does your project 
or approach take advantage of the liberal arts environment, or respond to its 
limitations? How is your work informed by the values of a liberal arts college? 
What is the role of liberal arts college institutions in the digital 
library/digital scholarship world?

Session Types

  *   Full Panel: Multiple presenters centered on a theme, in the format of 
your choice. (60 minutes)
  *   Presentation: Single or multiple presenters, covering specific topics or 
case studies. (20 minutes)
  *   Working session: An interactive session involving hands-on learning and 
collaboration. Single or multiple presenters. (30 minutes or 60 minutes)

Complete proposals should be submitted using the online submission form[1] by 
5:00 PM EST on June 22, 2015. Proposals must include a title, session type, 
information for each presenter (name, institution, and email), proposal 
description (maximum 300 words), and proposal abstract (maximum 100 words). You 
will hear about your proposal status by mid-August.

The 2015 DLF Liberal Arts Colleges Preconference[2] will be held October 25 in 
Vancouver, BC, at the Pinnacle Vancouver Harbourfront Hotel. The 2015 DLF Forum 
will be held October 26-28, and the Forum call for proposals[3] is also open 
until June 22.
[1] Proposal submission form: 
https://docs.google.com/forms/d/1rn3OuC38aZ4hplvkMMJsQvsLqPd2Dv3wtU1TzMGp4xQ/viewform?c=0w=1
[2] Preconference description: 
http://www.diglib.org/forums/2015forum/affiliated-events/dlflac
[3] DLF Forum CfP: http://www.diglib.org/forums/2015forum/cfp/


Re: [CODE4LIB] How to measure quality of a record

2015-05-06 Thread Phillips, Mark
I'll second Bob's recommendation on that paper.

I've found the following paper to be an interesting read on the topic of 
metadata quality and some of the ways that we could approach measuring it with 
automation. 

Automatic Evaluation of Metadata Quality in Digital Repositories by Xavier 
Ochoa and Erik Duval
https://lirias.kuleuven.be/bitstream/123456789/255807/2/xavuxavier-pre.pdf

Mark



From: Code for Libraries CODE4LIB@LISTSERV.ND.EDU on behalf of Robert 
Sandusky sandu...@uic.edu
Sent: Wednesday, May 6, 2015 4:42 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] How to measure quality of a record

I recommend this article as an entry point into a research program on
information quality:

Stvilia, B., Gasser, L., Twidale, M. B. and Smith, L. C. (2007), A
framework for information quality assessment. J. Am. Soc. Inf. Sci., 58:
1720–1733. doi:10.1002/asi.20652 Available at:
http://stvilia.cci.fsu.edu/wp-content/uploads/2011/03/IQAssessmentFramework.pdf

One cannot manage information quality (IQ) without first being able to
measure it meaningfully and establishing a causal connection between the
source of IQ change, the IQ problem types, the types of activities
affected, and their implications. In this article we propose a general
IQ assessment framework. In contrast to context-specific IQ assessment
models, which usually focus on a few variables determined by local
needs, our framework consists of comprehensive typologies of IQ
problems, related activities, and a taxonomy of IQ dimensions organized
in a systematic way based on sound theories and practices. The framework
can be used as a knowledge resource and as a guide for developing IQ
measurement models for many different settings. The framework was
validated and refined by developing specific IQ measurement models for
two large-scale collections of two large classes of information objects:
Simple Dublin Core records and online encyclopedia articles.

Bob

On 5/6/2015 4:32 PM, Diane Hillmann wrote:
 You might try this blog post, by Thomas Bruce, who was my co-author on an
 earlier article (referred to in the post):
 https://blog.law.cornell.edu/voxpop/2013/01/24/metadata-quality-in-a-linked-data-context/

 Diane

 On Wed, May 6, 2015 at 5:24 PM, Kyle Banerjee kyle.baner...@gmail.com
 wrote:

 On May 6, 2015, at 7:08 AM, James Morley james.mor...@europeana.eu
 wrote:

 I think a key thing is to determine to what extent any definition of
 'completeness' is actually a representation of 'quality'.  As Peter says,
 making sure not just that metadata is present but then checking it conforms
 with rules is a big step towards this.

 This.

 Basing quality measures too much on the presence of certain data points or
 the volume of data is fraught with peril. In experiments in the distant
 past, my experience was that looking for structure and syntax patterns that
 indicate good/bad quality as well as considering record sources was useful.
 Also keep in mind that any scoring system is to some extent arbitrary, so
 you don't want to read more into what it generates than appropriate.

 Kyle




Re: [CODE4LIB] How to measure quality of a record

2015-05-06 Thread Stuart A. Yeates
Here in .nz the national library runs a local aggregation service
http://digitalnz.org/ which has quite good penetration into schools
and so forth. It provides some metadata quality reports such as
http://metadata.digitalnz.org/nzresearch/127 for sources it aggregates
(that report is actually quite a bit dated).

My experience of these reports is they're useful in inverse proportion
to the diversity of the collection being reported on. The narrower
your collection, the more real issues are going to be caught.

cheers
stuart
--
...let us be heard from red core to black sky


Re: [CODE4LIB] Help with Auto Hot Key

2015-05-06 Thread Dawn Romano
Hi Eddie,
I'm not an autohotkey guru, but I just wanted to mention that when you are 
invoicing in Sierra, you do have the option to print the bib/order record for 
the item you are invoicing.  I believe this would provide all of the 
information you are looking for.  Of course, it will also provide the entire 
bib, which may not be what you are looking for, but it is not unusual to 
include this printout inside the book upon receipt/invoicing.  

Good luck, 
Dawn

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Eddie 
Clem
Sent: Monday, May 04, 2015 2:50 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] Help with Auto Hot Key

Hi there! I'm hoping someone here is a guru at AutoHotKey! :)



We have a clerk that pays our invoices in Sierra. She will write the bib number 
on a sticky note, as well as the list price and the locations (that each copy 
will go to). I want to have Sierra copy the bib number, list price, locations, 
and order record notes onto a receipt and then this clerk would put this 
receipt with the first copy of the material, rather than hand write on sticky 
notes all day! Since I had looked, and couldn't find a way to do this easily 
from Sierra, I had another brilliant idea that we could have Autohotkey copy 
the fields I want into a template (say, in Publisher) and have the bib number 
turned into a barcode, and list the other fields that we want that travel 
around the tech services department. This barcoded bib number would be used by 
catalogers to enter the bib number in the 949 for overlay in Connexion, and 
then again by our barcoding clerk to search by bib number in Sierra. At this 
point, I'm thinking that Autohotkey is my best bet.



Here is my prototype of what the routing slip would look like when it's done. 
The Thickety 2 is a note in the order record put in by our selectors for our 
catalogers to add that series to the bib record. The 978... is just a 
placeholder for where the list price will go once we get that field added to 
our order records:



[cid:image001.png@01D08679.A5CC5160]



Here is the corresponding order record. Part of my problem for Autohotkey is 
that not all order records will contain a note (in field z) and the locations 
may be different (fewer or more) on the LOCATIONS line. I have to include the 
multi line, because if it's just our Main Library that's receiving the item, 
then the LOCATIONS at the bottom don't show up at all...just the LOCATION fixed 
field (under ACQ TYPE).



[cid:image002.png@01D08679.A5CC5160]



Any thoughts would be greatly appreciated!



Thanks!

Eddie


Eddie Clem, MLS
Cataloging Librarian
ec...@khcpl.orgmailto:ec...@khcpl.org | www.KHCPL.orghttp://www.khcpl.org/

Kokomo-Howard County Public Library
Collection Management Department
305 East Mulberry Street
Kokomo, IN 46901
765.626.0853|765.450.6290 (fax)


Re: [CODE4LIB] Help with Auto Hot Key

2015-05-06 Thread Eddie Clem
This afternoon, I tried several different methods to print the order record 
(and order bib) onto receipt paper. That works well--except that it cuts off 
part of the of the order record note toward the bottom. (we'd prefer to use 
receipt paper rather than regular computer paper--it's much faster to print and 
auto-cuts!) Otherwise, I think it would work for my project. When I tried to 
make the text smaller (from 8 to 6 or 7), it made the font too light and it 
wasn't readable. Bummer! We're so close!! 

Eddie

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Dawn 
Romano
Sent: Wednesday, May 6, 2015 4:08 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Help with Auto Hot Key

Hi Eddie,
I'm not an autohotkey guru, but I just wanted to mention that when you are 
invoicing in Sierra, you do have the option to print the bib/order record for 
the item you are invoicing.  I believe this would provide all of the 
information you are looking for.  Of course, it will also provide the entire 
bib, which may not be what you are looking for, but it is not unusual to 
include this printout inside the book upon receipt/invoicing.  

Good luck, 
Dawn

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Eddie 
Clem
Sent: Monday, May 04, 2015 2:50 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] Help with Auto Hot Key

Hi there! I'm hoping someone here is a guru at AutoHotKey! :)



We have a clerk that pays our invoices in Sierra. She will write the bib number 
on a sticky note, as well as the list price and the locations (that each copy 
will go to). I want to have Sierra copy the bib number, list price, locations, 
and order record notes onto a receipt and then this clerk would put this 
receipt with the first copy of the material, rather than hand write on sticky 
notes all day! Since I had looked, and couldn't find a way to do this easily 
from Sierra, I had another brilliant idea that we could have Autohotkey copy 
the fields I want into a template (say, in Publisher) and have the bib number 
turned into a barcode, and list the other fields that we want that travel 
around the tech services department. This barcoded bib number would be used by 
catalogers to enter the bib number in the 949 for overlay in Connexion, and 
then again by our barcoding clerk to search by bib number in Sierra. At this 
point, I'm thinking that Autohotkey is my best bet.



Here is my prototype of what the routing slip would look like when it's done. 
The Thickety 2 is a note in the order record put in by our selectors for our 
catalogers to add that series to the bib record. The 978... is just a 
placeholder for where the list price will go once we get that field added to 
our order records:



[cid:image001.png@01D08679.A5CC5160]



Here is the corresponding order record. Part of my problem for Autohotkey is 
that not all order records will contain a note (in field z) and the locations 
may be different (fewer or more) on the LOCATIONS line. I have to include the 
multi line, because if it's just our Main Library that's receiving the item, 
then the LOCATIONS at the bottom don't show up at all...just the LOCATION fixed 
field (under ACQ TYPE).



[cid:image002.png@01D08679.A5CC5160]



Any thoughts would be greatly appreciated!



Thanks!

Eddie


Eddie Clem, MLS
Cataloging Librarian
ec...@khcpl.orgmailto:ec...@khcpl.org | www.KHCPL.orghttp://www.khcpl.org/

Kokomo-Howard County Public Library
Collection Management Department
305 East Mulberry Street
Kokomo, IN 46901
765.626.0853|765.450.6290 (fax)


Re: [CODE4LIB] Library Hours

2015-05-06 Thread Dan Scott
On Wed, May 6, 2015 at 8:15 AM, Ethan Gruber ewg4x...@gmail.com wrote:

 +1 on the RDFa and schema.org. For those that don't know the library URL
 off-hand, it is much easier to find a library website by Googling than it
 is to go through the central university portal, and the hours will show up
 at the top of the page after having been harvested by search engines.


Hi, so this is an area that I've done, and am doing, a fair bit of work.
See http://stuff.coffeecode.net/2015/ola_white_hat_seo/#/1/10 for some fun
slides from a presentation I gave in January at the Ontario Library
Association SuperConference that show some ways data gets into
Google/Yahoo/Bing and concludes that the OCLC Registry manually maintain
yet another copy of your data elsewhere approach isn't working. (Hit s
to get speaker notes).

The rest of the presentation goes into depth on how to use RDFa to mark up
a real library web page with location, contact info, opening hours, and
event info. And I've posited that crawling library sites to pull
single-sourced data (e.g. you update your website to provide updated hours
to humans, and the machines automatically benefit) would be a much more
effective, accurate, and usable approach than maintaining copies of the
data in Google+, OCLC Registry, etc. We could produce results like
http://cwrc.ca/rsc-src/ that stay accurate, rather than being one-off
efforts that decay over time. (It would be great if the OCLC Registry had a
crawl this URL option so that it could keep all of its data up-to-date
and incentive libraries to publish the data in a machine-readable format
such as RDFa + schema.org.)

On the but that's technically challenging front, I tried pursuing some
grant funding to produce templates for publishing that structured info in
Drupal, Joomla, and other commonly used CMSs. Sadly, my application was
recently denied, but that will only slow me down; I'm not going to give up
on the goal. I have a paper in the works that will expand on the content of
the presentation for those sites that have the ability (technical and
administrative) to modify their own web pages.

Sites running the Evergreen library system already generate a page for each
of their libraries that contains this structured data (e.g.
https://laurentian.concat.ca/eg/opac/library/OSUL), which is single sourced
from the data that has to be maintained in the library system anyway.

I'll happily acknowledge that getting search engines to harvest the right
data is not easy, though: right now, for example, if you search for J.N.
Desmarais Library it currently shows that the library is open 24 hours a
day, which is completely false--probably maliciously
submitted--information. *sigh* I've edited that info in the Google+ page at
https://plus.google.com/+JNDesmaraisLibraryGreaterSudbury but even though
it is a verified place and I am a manager of the G+ page, the edits still
go through approval by Googlers. There appears to be no good way to tell
Google Hey, *this* is the URL you are looking for!. Somewhat amusingly,
the entire reason I started working with schema.org dates back to an
presentation I attended about Google Places years ago, where I whined about
having to maintain yet another copy of data in yet another place, and the
response inferred that schema.org might be the solution to that problem.

Also, due to the structure of university web property ownership, we
currently don't have the ability to modify our actual library home page to
include any RDFa, which is a *wee* bit frustrating given my work in the
field. Heh.

Dan Scott
Laurentian University


Re: [CODE4LIB] How to measure quality of a record

2015-05-06 Thread Robert Sandusky
I recommend this article as an entry point into a research program on 
information quality:


Stvilia, B., Gasser, L., Twidale, M. B. and Smith, L. C. (2007), A 
framework for information quality assessment. J. Am. Soc. Inf. Sci., 58: 
1720–1733. doi:10.1002/asi.20652 Available at: 
http://stvilia.cci.fsu.edu/wp-content/uploads/2011/03/IQAssessmentFramework.pdf


One cannot manage information quality (IQ) without first being able to 
measure it meaningfully and establishing a causal connection between the 
source of IQ change, the IQ problem types, the types of activities 
affected, and their implications. In this article we propose a general 
IQ assessment framework. In contrast to context-specific IQ assessment 
models, which usually focus on a few variables determined by local 
needs, our framework consists of comprehensive typologies of IQ 
problems, related activities, and a taxonomy of IQ dimensions organized 
in a systematic way based on sound theories and practices. The framework 
can be used as a knowledge resource and as a guide for developing IQ 
measurement models for many different settings. The framework was 
validated and refined by developing specific IQ measurement models for 
two large-scale collections of two large classes of information objects: 
Simple Dublin Core records and online encyclopedia articles.


Bob

On 5/6/2015 4:32 PM, Diane Hillmann wrote:

You might try this blog post, by Thomas Bruce, who was my co-author on an
earlier article (referred to in the post):
https://blog.law.cornell.edu/voxpop/2013/01/24/metadata-quality-in-a-linked-data-context/

Diane

On Wed, May 6, 2015 at 5:24 PM, Kyle Banerjee kyle.baner...@gmail.com
wrote:


On May 6, 2015, at 7:08 AM, James Morley james.mor...@europeana.eu

wrote:


I think a key thing is to determine to what extent any definition of

'completeness' is actually a representation of 'quality'.  As Peter says,
making sure not just that metadata is present but then checking it conforms
with rules is a big step towards this.

This.

Basing quality measures too much on the presence of certain data points or
the volume of data is fraught with peril. In experiments in the distant
past, my experience was that looking for structure and syntax patterns that
indicate good/bad quality as well as considering record sources was useful.
Also keep in mind that any scoring system is to some extent arbitrary, so
you don't want to read more into what it generates than appropriate.

Kyle





Re: [CODE4LIB] How to measure quality of a record

2015-05-06 Thread Kyle Banerjee
 On May 6, 2015, at 7:08 AM, James Morley james.mor...@europeana.eu wrote:
 
 I think a key thing is to determine to what extent any definition of 
 'completeness' is actually a representation of 'quality'.  As Peter says, 
 making sure not just that metadata is present but then checking it conforms 
 with rules is a big step towards this. 

This. 

Basing quality measures too much on the presence of certain data points or the 
volume of data is fraught with peril. In experiments in the distant past, my 
experience was that looking for structure and syntax patterns that indicate 
good/bad quality as well as considering record sources was useful. Also keep in 
mind that any scoring system is to some extent arbitrary, so you don't want to 
read more into what it generates than appropriate.

Kyle


Re: [CODE4LIB] How to measure quality of a record

2015-05-06 Thread Diane Hillmann
You might try this blog post, by Thomas Bruce, who was my co-author on an
earlier article (referred to in the post):
https://blog.law.cornell.edu/voxpop/2013/01/24/metadata-quality-in-a-linked-data-context/

Diane

On Wed, May 6, 2015 at 5:24 PM, Kyle Banerjee kyle.baner...@gmail.com
wrote:

  On May 6, 2015, at 7:08 AM, James Morley james.mor...@europeana.eu
 wrote:
 
  I think a key thing is to determine to what extent any definition of
 'completeness' is actually a representation of 'quality'.  As Peter says,
 making sure not just that metadata is present but then checking it conforms
 with rules is a big step towards this.

 This.

 Basing quality measures too much on the presence of certain data points or
 the volume of data is fraught with peril. In experiments in the distant
 past, my experience was that looking for structure and syntax patterns that
 indicate good/bad quality as well as considering record sources was useful.
 Also keep in mind that any scoring system is to some extent arbitrary, so
 you don't want to read more into what it generates than appropriate.

 Kyle



Re: [CODE4LIB] Library Hours

2015-05-06 Thread BWS Johnson
Salvete!

Google often draws data from OpenStreetMap. If one wanted to, one could 
simply edit the Library information there and watch it get picked up rather 
quickly.


http://wiki.openstreetmap.org/wiki/Tag:amenity%3Dlibrary


#justsayin
Brooke


Re: [CODE4LIB] Library Hours

2015-05-06 Thread Steven Pryor
I don't know if this could give it a nudge (because as discussed, 
nobody knows how they work), but you can go into Google Maps (or 
https://www.google.com/business/ ) and find your place, and claim it 
with a Google account (you will have to be verified somehow, IIRC 
usually they will call the contact phone number with a code or 
something). This lets you put in lots of information that definitely 
*does* influence the Google results display, often with a card showing 
location, photo(s), hours, phone number(s), etc.  I put ours in some 
time ago by hand, and it looks like it has updated to our latest regular 
hours (which have changed since I would have put them in back then).


If you enter your hours this way, they will show up in a day or two. You 
will get a nice looking card in search results. You get Insights and 
other reports telling you how many times people searched your site, 
asked for directions, clicked the phone number, etc. And maybe, just 
maybe, their algorithm will compare that data with data from your site 
to match it up to automatically update in the future. They're definitely 
doing some kind of heuristic or guesswork parsing, since when it finds 
an update (as it did with our hours data, it does ask you to review 
and verify.


Steven
--
/Steven Pryor
Director of Digital Initiatives and Technologies
Assistant Professor
Library and Information Services
Southern Illinois University Edwardsville
(618) 650-3080
stpr...@siue.edu /

On 5/6/2015 9:33 AM, Megan O'Neill Kudzia wrote:

Hi all,

I've been experimenting with schema.org OpeningHoursSpecification, and
currently Bing is scraping our hours, but Google isn't. I am using
RDFa-lite and I've validated it using a linter (thanks Jason Ronallo!), so
I'm scratching my head as to why our hours *still* don't show up on a
google search.

I suspect part of it for us might be that we're re-branding away from
Stockwell-Mudd Libraries to Albion College Library, as it's much more
explanatory, but neither search through Google yields a nice box with hours
in it like the SFPL.

If and when I figure out the problem I'd be happy to send you an update of
what we did and what caused it to finally work properly.

On Wed, May 6, 2015 at 10:21 AM, Karen Coyle li...@kcoyle.net wrote:


Tom, Google will not tell you. The entirety of how Google search works is
a trade secret. We don't know the algorithm for ranking, and we don't know
what information they glean from web pages -- and they are unlikely to
tell. It is a constant on the schema.org discussion list that developers
want to know what Google/Bing/Yahoo/Yandex will do with specific
information in the web pages, and it is a constant that the reps there
reply: we cannot tell you that. The only way to find out is to code and
observe.

kc


On 5/6/15 7:00 AM, Tom Keays wrote:


I'd like to find out how and why Google is parsing this information. If
you
go to the the SFPL hours page (first link in the Google results), and look
at the source code, this is all you find.
http://sfpl.org/index.php?pg=010101
Is the ID in the DIV sufficient?  It would be nice to have a set of use
cases to work from.

Currently, I'm generating a weekly hours box by pulling JSONP from the
hours API of LibCal. I could easily output this in schema.org format (and
probably will now), but can Google pick up the information from the DOM if
it is delivered as JSON and transformed into HTML?

div id=library-hours
h2Hours/h2
table class=hours cellpadding=0 cellspacing=0
  tr
thSun/th
thMon/th
thTue/th
th class=todayWed/th
thThu/th
thFri/th
thSat/th
  /tr
  tr
td12-5/td
td10-6/td
td9-8/td
td class=today9-8/td
td9-8/td
td12-6/td
td10-6/td
  /tr
/table
/div


On Wed, May 6, 2015 at 9:47 AM, Karen Coyle li...@kcoyle.net wrote:

  Charlie, I don't know of any libraries that have used schema.org for

their web site - perhaps others do. If it is used, it should be picked
up
the next time the search engines index the site. What the search engines
do
with schema.org is not guaranteed, but can be observed. It is not
guaranteed because none of the search engines will say what they do, as
that is considered a trade secret (especially from each other).

However, as locations and hours are important for their commercial
customers (stores, restaurants, etc.) I would expect that to be picked up
as a matter of course. Note that already locations and hours for some
businesses do show in the search engines, and that is for sites that are
not yet using schema.org, so the engines have some way of picking that
up
from the HTML. The Google side-bar knowledge graph for my local
libraries
shows  Hours 


Re: [CODE4LIB] Library Hours

2015-05-06 Thread Karen Coyle
Yes, it definitely does. Which actually is a problem for Wikipedia 
because it encourages people/companies to try to get entries into WP for 
SEO purposes and so that the sidebox will show up. I spend a lot of time 
on the articles for deletion pages of WP trying to get these 
promotional pages out of the encyclopedia. A big success is when I see 
them disappear from search results. (BTW, the various ways that 
self-published authors of written crap game the system is truly 
astonishing. A+ for effort, and their skill in PR is way beyond their 
literary skills.)


kc

On 5/6/15 8:33 AM, Bigwood, David wrote:

I have heard that at least part of the sidebox is constructed using data from 
Wikipedia, especially the structured info in the infobox there.

Dave

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Karen 
Coyle
Sent: Wednesday, May 06, 2015 9:21 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Library Hours

Tom, Google will not tell you. The entirety of how Google search works is a 
trade secret. We don't know the algorithm for ranking, and we don't know what 
information they glean from web pages -- and they are unlikely to tell. It is a 
constant on the schema.org discussion list that developers want to know what 
Google/Bing/Yahoo/Yandex will do with specific information in the web pages, 
and it is a constant that the reps there reply: we cannot tell you that. The 
only way to find out is to code and observe.

kc


--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
m: +1-510-435-8234
skype: kcoylenet/+1-510-984-3600


Re: [CODE4LIB] How to measure quality of a record

2015-05-06 Thread Phillips, Mark
Sergio, 

I'm hoping the conversations and interest around  #metadataquality hashtag: 
https://twitter.com/hashtag/metadataquality help to move forward some of these 
conversations from well constructed research projects and academic papers to 
something that more of us can implement locally in our systems.  

There are many different ways that we could look at some of these problems and 
I think having more of us sharing our ideas and possibly code will be great. 

Mark


From: Code for Libraries CODE4LIB@LISTSERV.ND.EDU on behalf of Sergio Letuche 
code4libus...@gmail.com
Sent: Wednesday, May 6, 2015 7:20 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] How to measure quality of a record

i felt i was missing something, since i could not find some general, most
used approach, and perhaps some code on github that implements these
quality measures...

2015-05-06 15:08 GMT+03:00 James Morley james.mor...@europeana.eu:

 I think a key thing is to determine to what extent any definition of
 'completeness' is actually a representation of 'quality'.  As Peter says,
 making sure not just that metadata is present but then checking it conforms
 with rules is a big step towards this. I would also extend this to
 assessing at what level of accuracy things have been set, for example dates
 (a rough range vs a precise day) and geotags (coordinates presenting the
 centre of Paris vs the exact position that a photograph was taken from).
 These sorts of things can make a big difference to both the discoverability
 and practical reusability of records by end users.

 Best, James



 
 From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] on behalf of Esmé
 Cowles [escow...@ticklefish.org]
 Sent: 06 May 2015 13:51
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] How to measure quality of a record

 Sergio-

 Mark Phillips has a related blog post that I think is an excellent place
 to start, which outlines a system for scoring how complete a record is:

 http://vphill.com/journal/post/4075

 There was some discussion on twitter recently about this, which you can
 look up on the #metadataquality hashtag:
 https://twitter.com/hashtag/metadataquality

 I think there was a move to setup a mailing list for this topic or
 something like that, but I'm not sure where that stands now.

 -Esme

  On 05/06/15, at 7:21 AM, Sergio Letuche code4libus...@gmail.com wrote:
 
  Hello community,
 
  is there a way, any statistical approach, that you are aware of that
 let's
  say, allows one to have an idea of how complete a record is, or what
 are
  the actions you take in order to have an idea of the quality of a record,
  and eventually a database?
 
  Thank you in advance



Re: [CODE4LIB] Library Hours

2015-05-06 Thread Karen Coyle
I generally find that Bing makes better use of RDFa/schema.org than 
Google does.


kc

On 5/6/15 7:33 AM, Megan O'Neill Kudzia wrote:

Hi all,

I've been experimenting with schema.org OpeningHoursSpecification, and
currently Bing is scraping our hours, but Google isn't. I am using
RDFa-lite and I've validated it using a linter (thanks Jason Ronallo!), so
I'm scratching my head as to why our hours *still* don't show up on a
google search.

I suspect part of it for us might be that we're re-branding away from
Stockwell-Mudd Libraries to Albion College Library, as it's much more
explanatory, but neither search through Google yields a nice box with hours
in it like the SFPL.

If and when I figure out the problem I'd be happy to send you an update of
what we did and what caused it to finally work properly.

On Wed, May 6, 2015 at 10:21 AM, Karen Coyle li...@kcoyle.net wrote:


Tom, Google will not tell you. The entirety of how Google search works is
a trade secret. We don't know the algorithm for ranking, and we don't know
what information they glean from web pages -- and they are unlikely to
tell. It is a constant on the schema.org discussion list that developers
want to know what Google/Bing/Yahoo/Yandex will do with specific
information in the web pages, and it is a constant that the reps there
reply: we cannot tell you that. The only way to find out is to code and
observe.

kc


On 5/6/15 7:00 AM, Tom Keays wrote:


I'd like to find out how and why Google is parsing this information. If
you
go to the the SFPL hours page (first link in the Google results), and look
at the source code, this is all you find.
http://sfpl.org/index.php?pg=010101
Is the ID in the DIV sufficient?  It would be nice to have a set of use
cases to work from.

Currently, I'm generating a weekly hours box by pulling JSONP from the
hours API of LibCal. I could easily output this in schema.org format (and
probably will now), but can Google pick up the information from the DOM if
it is delivered as JSON and transformed into HTML?

div id=library-hours
h2Hours/h2
table class=hours cellpadding=0 cellspacing=0
  tr
thSun/th
thMon/th
thTue/th
th class=todayWed/th
thThu/th
thFri/th
thSat/th
  /tr
  tr
td12-5/td
td10-6/td
td9-8/td
td class=today9-8/td
td9-8/td
td12-6/td
td10-6/td
  /tr
/table
/div


On Wed, May 6, 2015 at 9:47 AM, Karen Coyle li...@kcoyle.net wrote:

  Charlie, I don't know of any libraries that have used schema.org for

their web site - perhaps others do. If it is used, it should be picked
up
the next time the search engines index the site. What the search engines
do
with schema.org is not guaranteed, but can be observed. It is not
guaranteed because none of the search engines will say what they do, as
that is considered a trade secret (especially from each other).

However, as locations and hours are important for their commercial
customers (stores, restaurants, etc.) I would expect that to be picked up
as a matter of course. Note that already locations and hours for some
businesses do show in the search engines, and that is for sites that are
not yet using schema.org, so the engines have some way of picking that
up
from the HTML. The Google side-bar knowledge graph for my local
libraries
shows  Hours 

https://www.google.com/search?sa=Xbiw=1299bih=561q=san+francisco+public+library+larkin+street+hoursstick=H4sIAGOovnz8BQMDgzYHnxCXfq6-gVlZhbF5sZZ0drKVfk5-cmJJZn4enGGVkV9aVBzLKeznIsHxlTMy2S10V0iJwvZlMgBPWBDOSAei=qhlKVcKWJ8b7oQS65oCQCAved=0CJgBEOgTMBA

:

Open today · 9:00 am – 8:00 pm javascript:void(0)
 but I have no idea where that comes from.

kc


On 5/6/15 5:22 AM, Charlie Morris wrote:

  I'm curious, Karen, Ethan or anyone else, do you know of any examples of

libraries that have implemented schema.org or RDFa for hours data and
have
noticed that Google or some other search engine has picked it up (i.e.,
correctly displaying that data as part of the search results)?  And if
so,
how quickly will Google or the like pickup on changes to hours (i.e.,
shifting between semesters or unplanned changes)?

On Wed, May 6, 2015 at 8:15 AM, Ethan Gruber ewg4x...@gmail.com
wrote:

   +1 on the RDFa and schema.org. For those that don't know the library
URL


off-hand, it is much easier to find a library website by Googling than
it
is to go through the central university portal, and the hours will show
up
at the top of the page after having been harvested by search engines.

On Tue, May 5, 2015 at 6:54 PM, Karen Coyle li...@kcoyle.net wrote:

   Note that library hours is one of the possible bits of information
that


could be encoded as RDFa in the library web site, thus making it
possible
to derive library hours directly from the listing of hours on the web

  site

  rather than keeping a separate list. Schema.org does have the elements

  such

  that hours can be encoded. This