[CODE4LIB] How to measure quality of a record
Hello community, is there a way, any statistical approach, that you are aware of that let's say, allows one to have an idea of how complete a record is, or what are the actions you take in order to have an idea of the quality of a record, and eventually a database? Thank you in advance
Re: [CODE4LIB] Library Hours
I'm curious, Karen, Ethan or anyone else, do you know of any examples of libraries that have implemented schema.org or RDFa for hours data and have noticed that Google or some other search engine has picked it up (i.e., correctly displaying that data as part of the search results)? And if so, how quickly will Google or the like pickup on changes to hours (i.e., shifting between semesters or unplanned changes)? On Wed, May 6, 2015 at 8:15 AM, Ethan Gruber ewg4x...@gmail.com wrote: +1 on the RDFa and schema.org. For those that don't know the library URL off-hand, it is much easier to find a library website by Googling than it is to go through the central university portal, and the hours will show up at the top of the page after having been harvested by search engines. On Tue, May 5, 2015 at 6:54 PM, Karen Coyle li...@kcoyle.net wrote: Note that library hours is one of the possible bits of information that could be encoded as RDFa in the library web site, thus making it possible to derive library hours directly from the listing of hours on the web site rather than keeping a separate list. Schema.org does have the elements such that hours can be encoded. This would mean that hours could show in the display of the library's catalog entry on Google, Yahoo and Bing. Being available directly through the search engines might be sufficient, not necessitating creating yet-another-database for that data. Schema.org uses a restaurant as its opening hours example, but much of the data would be the same for a library: div vocab=http://schema.org/; typeof=Restaurant span property=nameGreatFood/span div property=aggregateRating typeof=AggregateRating span property=ratingValue4/span stars - based on span property=reviewCount250/span reviews /div div property=address typeof=PostalAddress span property=streetAddress1901 Lemur Ave/span span property=addressLocalitySunnyvale/span, span property=addressRegionCA/span span property=postalCode94086/span /div span property=telephone(408) 714-1489/span a property=url href=http://www.dishdash.com;www.greatfood.com/a Hours: meta property=openingHours content=Mo-Sa 11:00-14:30Mon-Sat 11am - 2:30pm meta property=openingHours content=Mo-Th 17:00-21:30Mon-Thu 5pm - 9:30pm meta property=openingHours content=Fr-Sa 17:00-22:00Fri-Sat 5pm - 10:00pm Categories: span property=servesCuisine Middle Eastern /span, span property=servesCuisine Mediterranean /span Price Range: span property=priceRange$$/span Takes Reservations: Yes /div It seems to me that using schema.org would get more bang for the buck -- it would get into the search engines and could also be aggregated into whatever database is needed. As we've seen with OCLC, having a separate listing is likely to mean that the data will be out of date. kc On 5/5/15 2:19 PM, nitin arora wrote: I can't see they distinguished between public libraries and other types on their campaign page. They say all libraries as far as I can see. So I suppose then that this is true for all libraries: Libraries offer a space anyone can enter, where money isn't exchanged, and documentation doesn't have to be shown. Who knew fines and library/student-IDs were a thing of the past? The only data sets I can find where they got the 17,000 number is for public libraries: http://www.imls.gov/research/pls_data_files.aspx Maybe I missed something. There is an hours field on one of the CSVs I downloaded, etc for 2012 data (the most recent I could find). Asking 10k for something targeted for completion in June and without a grasp on what types of libraries there are and how volatile the hours information is (especially in crisis) ... Sounds naive at best, sketchy at worst. The flexible funding button says this campaign will receive all funds raised even if it does not reach its goals. The value of these places for youth cannot be underestimated. So is the value of a quick buck ... On Tue, May 5, 2015 at 4:53 PM, McCanna, Terran tmcca...@georgialibraries.org wrote: I'm not at all surprised that this doesn't already exist, and even if OCLC's was available, I'd be willing to bet it was out of date. Public library hours, especially in underfunded areas, may fluctuate depending on funding cycles, seasons (whether school is in or out), etc., not to mention closing/reopening/moving because of old buildings that need to be updated. We have around 280 locations in our consortium and we have to rely on self-reporting to find out if their hours change. We certainly don't have staff time to check every one of their web sites on regular basis, I can't imagine keeping track of 17,000! Terran McCanna PINES Program Manager Georgia Public Library Service 1800 Century Place, Suite 150 Atlanta, GA 30345
Re: [CODE4LIB] How to measure quality of a record
Hi, I thought a lot about this question in the past, and my answer is: yes, you can apply statistical formulas. But you should know well each field of your record: what kind of information could they contain, whether you could set rules about that which you can apply for the individual records. Some factors which are important: - the completeness of the records: the ratio of the fields filled and unfilled - the value of an individual field matches the rules or not (say you expect a number in the range of 1 to 5, but you get 6) - the probability that a given field value could be unique - the probability that a record is not duplication of another record Some concrete example from my Europeana past: - there are mandatory fields, and if they are empty, the quality goes down - there are fields which should match a known standard, for example ISO language codes - you can apply rules to decide whether the value fits or not - the data provider field is a free text - no formal rule - but no individual record could contain unique value, and when you import several thousands of new record, they should not contain more than a couple new values - there are fields which should contain URLs or emails or dates, we can check whether they fit for formal rules, and their content are in a reasonable range (we should not have record created in the future for example) - you can measure whether the optional fields are fulfilled, and in which ratio At the end you will have a couple of measurements, and you can apply weighting to calculate a final classification number. You can do a lot to set up rules with faceted search, and of course you can use statistical tools, such as R, Julia which helps to get a picture of distribution of the values. Hope it helps. Regards, Péter -- Péter Király software developer Göttingen Society for Scientific Data Processing - http://gwdg.de eXtensible Catalog - http://eXtensibleCatalog.org
Re: [CODE4LIB] How to measure quality of a record
I think a key thing is to determine to what extent any definition of 'completeness' is actually a representation of 'quality'. As Peter says, making sure not just that metadata is present but then checking it conforms with rules is a big step towards this. I would also extend this to assessing at what level of accuracy things have been set, for example dates (a rough range vs a precise day) and geotags (coordinates presenting the centre of Paris vs the exact position that a photograph was taken from). These sorts of things can make a big difference to both the discoverability and practical reusability of records by end users. Best, James From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] on behalf of Esmé Cowles [escow...@ticklefish.org] Sent: 06 May 2015 13:51 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] How to measure quality of a record Sergio- Mark Phillips has a related blog post that I think is an excellent place to start, which outlines a system for scoring how complete a record is: http://vphill.com/journal/post/4075 There was some discussion on twitter recently about this, which you can look up on the #metadataquality hashtag: https://twitter.com/hashtag/metadataquality I think there was a move to setup a mailing list for this topic or something like that, but I'm not sure where that stands now. -Esme On 05/06/15, at 7:21 AM, Sergio Letuche code4libus...@gmail.com wrote: Hello community, is there a way, any statistical approach, that you are aware of that let's say, allows one to have an idea of how complete a record is, or what are the actions you take in order to have an idea of the quality of a record, and eventually a database? Thank you in advance
Re: [CODE4LIB] Library Hours
+1 on the RDFa and schema.org. For those that don't know the library URL off-hand, it is much easier to find a library website by Googling than it is to go through the central university portal, and the hours will show up at the top of the page after having been harvested by search engines. On Tue, May 5, 2015 at 6:54 PM, Karen Coyle li...@kcoyle.net wrote: Note that library hours is one of the possible bits of information that could be encoded as RDFa in the library web site, thus making it possible to derive library hours directly from the listing of hours on the web site rather than keeping a separate list. Schema.org does have the elements such that hours can be encoded. This would mean that hours could show in the display of the library's catalog entry on Google, Yahoo and Bing. Being available directly through the search engines might be sufficient, not necessitating creating yet-another-database for that data. Schema.org uses a restaurant as its opening hours example, but much of the data would be the same for a library: div vocab=http://schema.org/; typeof=Restaurant span property=nameGreatFood/span div property=aggregateRating typeof=AggregateRating span property=ratingValue4/span stars - based on span property=reviewCount250/span reviews /div div property=address typeof=PostalAddress span property=streetAddress1901 Lemur Ave/span span property=addressLocalitySunnyvale/span, span property=addressRegionCA/span span property=postalCode94086/span /div span property=telephone(408) 714-1489/span a property=url href=http://www.dishdash.com;www.greatfood.com/a Hours: meta property=openingHours content=Mo-Sa 11:00-14:30Mon-Sat 11am - 2:30pm meta property=openingHours content=Mo-Th 17:00-21:30Mon-Thu 5pm - 9:30pm meta property=openingHours content=Fr-Sa 17:00-22:00Fri-Sat 5pm - 10:00pm Categories: span property=servesCuisine Middle Eastern /span, span property=servesCuisine Mediterranean /span Price Range: span property=priceRange$$/span Takes Reservations: Yes /div It seems to me that using schema.org would get more bang for the buck -- it would get into the search engines and could also be aggregated into whatever database is needed. As we've seen with OCLC, having a separate listing is likely to mean that the data will be out of date. kc On 5/5/15 2:19 PM, nitin arora wrote: I can't see they distinguished between public libraries and other types on their campaign page. They say all libraries as far as I can see. So I suppose then that this is true for all libraries: Libraries offer a space anyone can enter, where money isn't exchanged, and documentation doesn't have to be shown. Who knew fines and library/student-IDs were a thing of the past? The only data sets I can find where they got the 17,000 number is for public libraries: http://www.imls.gov/research/pls_data_files.aspx Maybe I missed something. There is an hours field on one of the CSVs I downloaded, etc for 2012 data (the most recent I could find). Asking 10k for something targeted for completion in June and without a grasp on what types of libraries there are and how volatile the hours information is (especially in crisis) ... Sounds naive at best, sketchy at worst. The flexible funding button says this campaign will receive all funds raised even if it does not reach its goals. The value of these places for youth cannot be underestimated. So is the value of a quick buck ... On Tue, May 5, 2015 at 4:53 PM, McCanna, Terran tmcca...@georgialibraries.org wrote: I'm not at all surprised that this doesn't already exist, and even if OCLC's was available, I'd be willing to bet it was out of date. Public library hours, especially in underfunded areas, may fluctuate depending on funding cycles, seasons (whether school is in or out), etc., not to mention closing/reopening/moving because of old buildings that need to be updated. We have around 280 locations in our consortium and we have to rely on self-reporting to find out if their hours change. We certainly don't have staff time to check every one of their web sites on regular basis, I can't imagine keeping track of 17,000! Terran McCanna PINES Program Manager Georgia Public Library Service 1800 Century Place, Suite 150 Atlanta, GA 30345 404-235-7138 tmcca...@georgialibraries.org - Original Message - From: Peter Murray jes...@dltj.org To: CODE4LIB@LISTSERV.ND.EDU Sent: Tuesday, May 5, 2015 4:36:56 PM Subject: Re: [CODE4LIB] Library Hours OCLC has an institutional registry [1], which had (in part) library hours, addresses, and so forth. It seems to be unavailable, though [2]. That is the only systematic collection of library hours data that I know about. Peter [1] https://www.oclc.org/worldcat-registry.en.html [2] https://www.worldcat.org/registry/institution/ On May 5, 2015, at 4:16 PM, Bigwood,
Re: [CODE4LIB] How to measure quality of a record
i felt i was missing something, since i could not find some general, most used approach, and perhaps some code on github that implements these quality measures... 2015-05-06 15:08 GMT+03:00 James Morley james.mor...@europeana.eu: I think a key thing is to determine to what extent any definition of 'completeness' is actually a representation of 'quality'. As Peter says, making sure not just that metadata is present but then checking it conforms with rules is a big step towards this. I would also extend this to assessing at what level of accuracy things have been set, for example dates (a rough range vs a precise day) and geotags (coordinates presenting the centre of Paris vs the exact position that a photograph was taken from). These sorts of things can make a big difference to both the discoverability and practical reusability of records by end users. Best, James From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] on behalf of Esmé Cowles [escow...@ticklefish.org] Sent: 06 May 2015 13:51 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] How to measure quality of a record Sergio- Mark Phillips has a related blog post that I think is an excellent place to start, which outlines a system for scoring how complete a record is: http://vphill.com/journal/post/4075 There was some discussion on twitter recently about this, which you can look up on the #metadataquality hashtag: https://twitter.com/hashtag/metadataquality I think there was a move to setup a mailing list for this topic or something like that, but I'm not sure where that stands now. -Esme On 05/06/15, at 7:21 AM, Sergio Letuche code4libus...@gmail.com wrote: Hello community, is there a way, any statistical approach, that you are aware of that let's say, allows one to have an idea of how complete a record is, or what are the actions you take in order to have an idea of the quality of a record, and eventually a database? Thank you in advance
Re: [CODE4LIB] Library Hours
Charlie, I don't know of any libraries that have used schema.org for their web site - perhaps others do. If it is used, it should be picked up the next time the search engines index the site. What the search engines do with schema.org is not guaranteed, but can be observed. It is not guaranteed because none of the search engines will say what they do, as that is considered a trade secret (especially from each other). However, as locations and hours are important for their commercial customers (stores, restaurants, etc.) I would expect that to be picked up as a matter of course. Note that already locations and hours for some businesses do show in the search engines, and that is for sites that are not yet using schema.org, so the engines have some way of picking that up from the HTML. The Google side-bar knowledge graph for my local libraries shows Hours https://www.google.com/search?sa=Xbiw=1299bih=561q=san+francisco+public+library+larkin+street+hoursstick=H4sIAGOovnz8BQMDgzYHnxCXfq6-gVlZhbF5sZZ0drKVfk5-cmJJZn4enGGVkV9aVBzLKeznIsHxlTMy2S10V0iJwvZlMgBPWBDOSAei=qhlKVcKWJ8b7oQS65oCQCAved=0CJgBEOgTMBA: Open today · 9:00 am – 8:00 pm javascript:void(0) but I have no idea where that comes from. kc On 5/6/15 5:22 AM, Charlie Morris wrote: I'm curious, Karen, Ethan or anyone else, do you know of any examples of libraries that have implemented schema.org or RDFa for hours data and have noticed that Google or some other search engine has picked it up (i.e., correctly displaying that data as part of the search results)? And if so, how quickly will Google or the like pickup on changes to hours (i.e., shifting between semesters or unplanned changes)? On Wed, May 6, 2015 at 8:15 AM, Ethan Gruber ewg4x...@gmail.com wrote: +1 on the RDFa and schema.org. For those that don't know the library URL off-hand, it is much easier to find a library website by Googling than it is to go through the central university portal, and the hours will show up at the top of the page after having been harvested by search engines. On Tue, May 5, 2015 at 6:54 PM, Karen Coyle li...@kcoyle.net wrote: Note that library hours is one of the possible bits of information that could be encoded as RDFa in the library web site, thus making it possible to derive library hours directly from the listing of hours on the web site rather than keeping a separate list. Schema.org does have the elements such that hours can be encoded. This would mean that hours could show in the display of the library's catalog entry on Google, Yahoo and Bing. Being available directly through the search engines might be sufficient, not necessitating creating yet-another-database for that data. Schema.org uses a restaurant as its opening hours example, but much of the data would be the same for a library: div vocab=http://schema.org/; typeof=Restaurant span property=nameGreatFood/span div property=aggregateRating typeof=AggregateRating span property=ratingValue4/span stars - based on span property=reviewCount250/span reviews /div div property=address typeof=PostalAddress span property=streetAddress1901 Lemur Ave/span span property=addressLocalitySunnyvale/span, span property=addressRegionCA/span span property=postalCode94086/span /div span property=telephone(408) 714-1489/span a property=url href=http://www.dishdash.com;www.greatfood.com/a Hours: meta property=openingHours content=Mo-Sa 11:00-14:30Mon-Sat 11am - 2:30pm meta property=openingHours content=Mo-Th 17:00-21:30Mon-Thu 5pm - 9:30pm meta property=openingHours content=Fr-Sa 17:00-22:00Fri-Sat 5pm - 10:00pm Categories: span property=servesCuisine Middle Eastern /span, span property=servesCuisine Mediterranean /span Price Range: span property=priceRange$$/span Takes Reservations: Yes /div It seems to me that using schema.org would get more bang for the buck -- it would get into the search engines and could also be aggregated into whatever database is needed. As we've seen with OCLC, having a separate listing is likely to mean that the data will be out of date. kc On 5/5/15 2:19 PM, nitin arora wrote: I can't see they distinguished between public libraries and other types on their campaign page. They say all libraries as far as I can see. So I suppose then that this is true for all libraries: Libraries offer a space anyone can enter, where money isn't exchanged, and documentation doesn't have to be shown. Who knew fines and library/student-IDs were a thing of the past? The only data sets I can find where they got the 17,000 number is for public libraries: http://www.imls.gov/research/pls_data_files.aspx Maybe I missed something. There is an hours field on one of the CSVs I downloaded, etc for 2012 data (the most recent I could find). Asking 10k for something targeted for completion in June and without a grasp on what types of libraries there are and how volatile the hours
Re: [CODE4LIB] Library Hours
Hi Open today · 9:00 am – 8:00 pm javascript:void(0) but I have no idea where that comes from. probably because the web page http://sfpl.org/index.php?pg=010101 insert library hours inside div id=library-hours /div Bye Zeno Tajoli -- Dr. Zeno Tajoli Servizi Innovativi -- Automazione Biblioteche z.taj...@cineca.it fax +39 02 2135520 CINECA - Sede operativa di Segrate
Re: [CODE4LIB] Library Hours
The search engine may not pick it up quickly enough, but the emergency services in the area could get it from the RDFa as soon as it hits the web. kc On 5/6/15 6:45 AM, nitin arora wrote: I think both creating a one-off list and schema.org approaches pose problems within the context of the original fund raising campaign's pitch. I don't think every library can necessarily implement the latter for a variety of reasons, not always technical. From the pov that a library can be a community center in a time of crisis, I'm wondering not only how quickly a search engine would pick that up but also, in such moments, how prioritized updating that data would be in the first place. On Wed, May 6, 2015 at 8:22 AM, Charlie Morris cdmorri...@gmail.com wrote: I'm curious, Karen, Ethan or anyone else, do you know of any examples of libraries that have implemented schema.org or RDFa for hours data and have noticed that Google or some other search engine has picked it up (i.e., correctly displaying that data as part of the search results)? And if so, how quickly will Google or the like pickup on changes to hours (i.e., shifting between semesters or unplanned changes)? On Wed, May 6, 2015 at 8:15 AM, Ethan Gruber ewg4x...@gmail.com wrote: +1 on the RDFa and schema.org. For those that don't know the library URL off-hand, it is much easier to find a library website by Googling than it is to go through the central university portal, and the hours will show up at the top of the page after having been harvested by search engines. On Tue, May 5, 2015 at 6:54 PM, Karen Coyle li...@kcoyle.net wrote: Note that library hours is one of the possible bits of information that could be encoded as RDFa in the library web site, thus making it possible to derive library hours directly from the listing of hours on the web site rather than keeping a separate list. Schema.org does have the elements such that hours can be encoded. This would mean that hours could show in the display of the library's catalog entry on Google, Yahoo and Bing. Being available directly through the search engines might be sufficient, not necessitating creating yet-another-database for that data. Schema.org uses a restaurant as its opening hours example, but much of the data would be the same for a library: div vocab=http://schema.org/; typeof=Restaurant span property=nameGreatFood/span div property=aggregateRating typeof=AggregateRating span property=ratingValue4/span stars - based on span property=reviewCount250/span reviews /div div property=address typeof=PostalAddress span property=streetAddress1901 Lemur Ave/span span property=addressLocalitySunnyvale/span, span property=addressRegionCA/span span property=postalCode94086/span /div span property=telephone(408) 714-1489/span a property=url href=http://www.dishdash.com;www.greatfood.com /a Hours: meta property=openingHours content=Mo-Sa 11:00-14:30Mon-Sat 11am - 2:30pm meta property=openingHours content=Mo-Th 17:00-21:30Mon-Thu 5pm - 9:30pm meta property=openingHours content=Fr-Sa 17:00-22:00Fri-Sat 5pm - 10:00pm Categories: span property=servesCuisine Middle Eastern /span, span property=servesCuisine Mediterranean /span Price Range: span property=priceRange$$/span Takes Reservations: Yes /div It seems to me that using schema.org would get more bang for the buck -- it would get into the search engines and could also be aggregated into whatever database is needed. As we've seen with OCLC, having a separate listing is likely to mean that the data will be out of date. kc On 5/5/15 2:19 PM, nitin arora wrote: I can't see they distinguished between public libraries and other types on their campaign page. They say all libraries as far as I can see. So I suppose then that this is true for all libraries: Libraries offer a space anyone can enter, where money isn't exchanged, and documentation doesn't have to be shown. Who knew fines and library/student-IDs were a thing of the past? The only data sets I can find where they got the 17,000 number is for public libraries: http://www.imls.gov/research/pls_data_files.aspx Maybe I missed something. There is an hours field on one of the CSVs I downloaded, etc for 2012 data (the most recent I could find). Asking 10k for something targeted for completion in June and without a grasp on what types of libraries there are and how volatile the hours information is (especially in crisis) ... Sounds naive at best, sketchy at worst. The flexible funding button says this campaign will receive all funds raised even if it does not reach its goals. The value of these places for youth cannot be underestimated. So is the value of a quick buck ... On Tue, May 5, 2015 at 4:53 PM, McCanna, Terran tmcca...@georgialibraries.org wrote: I'm not at all surprised that this doesn't already exist, and even if OCLC's was available, I'd be willing to bet it
Re: [CODE4LIB] Library Hours
I think both creating a one-off list and schema.org approaches pose problems within the context of the original fund raising campaign's pitch. I don't think every library can necessarily implement the latter for a variety of reasons, not always technical. From the pov that a library can be a community center in a time of crisis, I'm wondering not only how quickly a search engine would pick that up but also, in such moments, how prioritized updating that data would be in the first place. On Wed, May 6, 2015 at 8:22 AM, Charlie Morris cdmorri...@gmail.com wrote: I'm curious, Karen, Ethan or anyone else, do you know of any examples of libraries that have implemented schema.org or RDFa for hours data and have noticed that Google or some other search engine has picked it up (i.e., correctly displaying that data as part of the search results)? And if so, how quickly will Google or the like pickup on changes to hours (i.e., shifting between semesters or unplanned changes)? On Wed, May 6, 2015 at 8:15 AM, Ethan Gruber ewg4x...@gmail.com wrote: +1 on the RDFa and schema.org. For those that don't know the library URL off-hand, it is much easier to find a library website by Googling than it is to go through the central university portal, and the hours will show up at the top of the page after having been harvested by search engines. On Tue, May 5, 2015 at 6:54 PM, Karen Coyle li...@kcoyle.net wrote: Note that library hours is one of the possible bits of information that could be encoded as RDFa in the library web site, thus making it possible to derive library hours directly from the listing of hours on the web site rather than keeping a separate list. Schema.org does have the elements such that hours can be encoded. This would mean that hours could show in the display of the library's catalog entry on Google, Yahoo and Bing. Being available directly through the search engines might be sufficient, not necessitating creating yet-another-database for that data. Schema.org uses a restaurant as its opening hours example, but much of the data would be the same for a library: div vocab=http://schema.org/; typeof=Restaurant span property=nameGreatFood/span div property=aggregateRating typeof=AggregateRating span property=ratingValue4/span stars - based on span property=reviewCount250/span reviews /div div property=address typeof=PostalAddress span property=streetAddress1901 Lemur Ave/span span property=addressLocalitySunnyvale/span, span property=addressRegionCA/span span property=postalCode94086/span /div span property=telephone(408) 714-1489/span a property=url href=http://www.dishdash.com;www.greatfood.com /a Hours: meta property=openingHours content=Mo-Sa 11:00-14:30Mon-Sat 11am - 2:30pm meta property=openingHours content=Mo-Th 17:00-21:30Mon-Thu 5pm - 9:30pm meta property=openingHours content=Fr-Sa 17:00-22:00Fri-Sat 5pm - 10:00pm Categories: span property=servesCuisine Middle Eastern /span, span property=servesCuisine Mediterranean /span Price Range: span property=priceRange$$/span Takes Reservations: Yes /div It seems to me that using schema.org would get more bang for the buck -- it would get into the search engines and could also be aggregated into whatever database is needed. As we've seen with OCLC, having a separate listing is likely to mean that the data will be out of date. kc On 5/5/15 2:19 PM, nitin arora wrote: I can't see they distinguished between public libraries and other types on their campaign page. They say all libraries as far as I can see. So I suppose then that this is true for all libraries: Libraries offer a space anyone can enter, where money isn't exchanged, and documentation doesn't have to be shown. Who knew fines and library/student-IDs were a thing of the past? The only data sets I can find where they got the 17,000 number is for public libraries: http://www.imls.gov/research/pls_data_files.aspx Maybe I missed something. There is an hours field on one of the CSVs I downloaded, etc for 2012 data (the most recent I could find). Asking 10k for something targeted for completion in June and without a grasp on what types of libraries there are and how volatile the hours information is (especially in crisis) ... Sounds naive at best, sketchy at worst. The flexible funding button says this campaign will receive all funds raised even if it does not reach its goals. The value of these places for youth cannot be underestimated. So is the value of a quick buck ... On Tue, May 5, 2015 at 4:53 PM, McCanna, Terran tmcca...@georgialibraries.org wrote: I'm not at all surprised that this doesn't already exist, and even if OCLC's was
Re: [CODE4LIB] Library Hours
I'd like to find out how and why Google is parsing this information. If you go to the the SFPL hours page (first link in the Google results), and look at the source code, this is all you find. http://sfpl.org/index.php?pg=010101 Is the ID in the DIV sufficient? It would be nice to have a set of use cases to work from. Currently, I'm generating a weekly hours box by pulling JSONP from the hours API of LibCal. I could easily output this in schema.org format (and probably will now), but can Google pick up the information from the DOM if it is delivered as JSON and transformed into HTML? div id=library-hours h2Hours/h2 table class=hours cellpadding=0 cellspacing=0 tr thSun/th thMon/th thTue/th th class=todayWed/th thThu/th thFri/th thSat/th /tr tr td12-5/td td10-6/td td9-8/td td class=today9-8/td td9-8/td td12-6/td td10-6/td /tr /table /div On Wed, May 6, 2015 at 9:47 AM, Karen Coyle li...@kcoyle.net wrote: Charlie, I don't know of any libraries that have used schema.org for their web site - perhaps others do. If it is used, it should be picked up the next time the search engines index the site. What the search engines do with schema.org is not guaranteed, but can be observed. It is not guaranteed because none of the search engines will say what they do, as that is considered a trade secret (especially from each other). However, as locations and hours are important for their commercial customers (stores, restaurants, etc.) I would expect that to be picked up as a matter of course. Note that already locations and hours for some businesses do show in the search engines, and that is for sites that are not yet using schema.org, so the engines have some way of picking that up from the HTML. The Google side-bar knowledge graph for my local libraries shows Hours https://www.google.com/search?sa=Xbiw=1299bih=561q=san+francisco+public+library+larkin+street+hoursstick=H4sIAGOovnz8BQMDgzYHnxCXfq6-gVlZhbF5sZZ0drKVfk5-cmJJZn4enGGVkV9aVBzLKeznIsHxlTMy2S10V0iJwvZlMgBPWBDOSAei=qhlKVcKWJ8b7oQS65oCQCAved=0CJgBEOgTMBA: Open today · 9:00 am – 8:00 pm javascript:void(0) but I have no idea where that comes from. kc On 5/6/15 5:22 AM, Charlie Morris wrote: I'm curious, Karen, Ethan or anyone else, do you know of any examples of libraries that have implemented schema.org or RDFa for hours data and have noticed that Google or some other search engine has picked it up (i.e., correctly displaying that data as part of the search results)? And if so, how quickly will Google or the like pickup on changes to hours (i.e., shifting between semesters or unplanned changes)? On Wed, May 6, 2015 at 8:15 AM, Ethan Gruber ewg4x...@gmail.com wrote: +1 on the RDFa and schema.org. For those that don't know the library URL off-hand, it is much easier to find a library website by Googling than it is to go through the central university portal, and the hours will show up at the top of the page after having been harvested by search engines. On Tue, May 5, 2015 at 6:54 PM, Karen Coyle li...@kcoyle.net wrote: Note that library hours is one of the possible bits of information that could be encoded as RDFa in the library web site, thus making it possible to derive library hours directly from the listing of hours on the web site rather than keeping a separate list. Schema.org does have the elements such that hours can be encoded. This would mean that hours could show in the display of the library's catalog entry on Google, Yahoo and Bing. Being available directly through the search engines might be sufficient, not necessitating creating yet-another-database for that data. Schema.org uses a restaurant as its opening hours example, but much of the data would be the same for a library: div vocab=http://schema.org/; typeof=Restaurant span property=nameGreatFood/span div property=aggregateRating typeof=AggregateRating span property=ratingValue4/span stars - based on span property=reviewCount250/span reviews /div div property=address typeof=PostalAddress span property=streetAddress1901 Lemur Ave/span span property=addressLocalitySunnyvale/span, span property=addressRegionCA/span span property=postalCode94086/span /div span property=telephone(408) 714-1489/span a property=url href=http://www.dishdash.com;www.greatfood.com /a Hours: meta property=openingHours content=Mo-Sa 11:00-14:30Mon-Sat 11am - 2:30pm meta property=openingHours content=Mo-Th 17:00-21:30Mon-Thu 5pm - 9:30pm meta property=openingHours content=Fr-Sa 17:00-22:00Fri-Sat 5pm - 10:00pm Categories: span property=servesCuisine Middle Eastern /span, span property=servesCuisine Mediterranean /span Price Range: span property=priceRange$$/span Takes Reservations: Yes /div It seems to me
Re: [CODE4LIB] Library Hours
Right, but I don't think that meets any particular standard, which means that Google is doing a lot of text analysis when it indexes pages, looking for a pattern that looks like opening hours. That takes more cycles than having it all neatly wrapped in some known RDFa. kc On 5/6/15 6:54 AM, Tajoli Zeno wrote: Hi Open today · 9:00 am – 8:00 pm javascript:void(0) but I have no idea where that comes from. probably because the web page http://sfpl.org/index.php?pg=010101 insert library hours inside div id=library-hours /div Bye Zeno Tajoli -- Karen Coyle kco...@kcoyle.net http://kcoyle.net m: +1-510-435-8234 skype: kcoylenet/+1-510-984-3600
[CODE4LIB] Learn to Teach Coding - [free] webinar and ALA pre conference
Please note the webinar will be free to the first 100 log ins. If you're interested in teaching code/mentoring in technology, this may be of interest! Cheers -- Forwarded message -- From: *Mark Beatty* mbea...@ala.org mailto:mbea...@ala.org Date: Tue, May 5, 2015 at 2:43 PM Subject: [lita-l] Learn to Teach Coding - webinar and ALA pre conference To: lit...@lists.ala.org mailto:lit...@lists.ala.org lit...@lists.ala.org mailto:lit...@lists.ala.org Learn to Teach Coding and Mentor Technology Newbies – in Your Library or Anywhere! Attend a free one hour webinar http://ala.adobeconnect.com/teachcoding/to discover what learning to teach coding is all about, and then register http://alaac15.ala.org/register-now for and attend the LITA preconference at ALA Annual http://www.ala.org/lita/conferences/annual/2015. This opportunity is following up on the 2014 LITA President’s Program at ALA Annual where then LITA President Cindi Trainor Blyberg welcomed Kimberly Bryant, founder of Black Girls Code. The informational webinar is free and open to the first 100 log-ins: Tuesday May 26, 2015 at 1:00 pm Central Time http://ala.adobeconnect.com/teachcoding/ http://www.ala.org/lita/conferences/annual/2015 Enter as guest. The webinar will be recorded and the link to the recording will be posted to these same resource spaces. Register online for the ALA Annual Conference and add a LITA Preconference http://alaac15.ala.org/register-now Black Girls CODE (BGC) http://www.blackgirlscode.com/ is devoted to showing the world that black girls can code, and grow the number of women of color working in technology. LITA is devoted to putting on programs that promote, develop, and aid in the implementation of library and information technology. Together, BCG and LITA offer this full day pre-conference workshop, designed to turn reasonably tech savvy librarians into master technology teachers. The workshop will help attendees develop effective lesson plans and design projects their students can complete successfully in their own coding workshops. The schedule will feature presentations in the morning followed by afternoon breakout workgroups, in which attendees can experiment with programming languages such as Scratch, Ruby on Rails, and more. Presenters: Kimberly Bryant, Founder and Executive Director Black Girls CODE http://www.blackgirlscode.com/about-bgc.html Lake Raymond, Program Coordinator Black Girls CODE Mikala Streeter, Curriculum Consultant Black Girls CODE The Black Girl Code Vision: To increase the number of women of color in the digital space by empowering girls of color ages 7 to 17 to become innovators in STEM fields, leaders in their communities, and builders of their own futures through exposure to computer science and technology. Kimberly Bryant: That, really, is the Black Girls Code mission: to introduce programming and technology to a new generation of coders, coders who will become builders of technological innovation and of their own futures. Imagine the impact that these curious, creative minds could have on the world with the guidance and encouragement others take for granted. REGISTRATION: Cost • LITA Member $235 (coupon code: LITA2015) • ALA Member $350 • Non-Member $380 How-to To register for any of these events, you can include them with your initial conference registration or add them later using the unique link in your email confirmation. If you don’t have your registration confirmation handy, you can request a copy by emailing alaann...@compusystems.com mailto:alaann...@compusystems.com. You also have the option of registering for a preconference only. To receive the LITA member pricing during the registration process on the Personal Information page enter the discount promotional code: LITA2015 Register online for the ALA Annual Conference and add a LITA Preconference http://alaac15.ala.org/register-now Call ALA Registration at 1-800-974-3084 tel:1-800-974-3084 Onsite registration will also be accepted in San Francisco. Questions or Comments? For all other questions or comments related to the course, contact LITA at (312) 280-4269 tel:%28312%29%20280-4269 or Mark Beatty, mbea...@ala.org mailto:mbea...@ala.org _/_/_/_/_/ Mark Beatty Programs and Marketing Specialist ALA/LITA 50 East Huron Chicago, IL 60611 312.280.4268 tel:312.280.4268 mbea...@ala.org mailto:mbea...@ala.org www.lita.org http://www.lita.org -- Abigail Goben, MLS abigailgo...@gmail.com mailto:abigailgo...@gmail.com http://HedgehogLibrarian.com
Re: [CODE4LIB] Library Hours
I believe the objective, of the search engines, is to be able to provide user useful functionality in both their Knowledge Graphs and on mobile devices for all local businesses. I note now when I search for the local branch of Best Buy or similar on my iPhone I get the 'Open Now' or 'Closed Now' message as part of the result. Karen is right about anyone, including emergency services, being able to harvest this data from your site - mage easier by using a consistent format such as Schema.org. As an aside, the Schema.org community is currently discussing the formatting of opening hours, and consistence with other similar event based timings. It looks like they are going to keep it simple for the moment, returning to issues such as exceptions like open every day 9-5 except Wednesdays in January of a leap year in the near future. Richard. On 6 May 2015 at 15:02, Karen Coyle li...@kcoyle.net wrote: The search engine may not pick it up quickly enough, but the emergency services in the area could get it from the RDFa as soon as it hits the web. kc On 5/6/15 6:45 AM, nitin arora wrote: I think both creating a one-off list and schema.org approaches pose problems within the context of the original fund raising campaign's pitch. I don't think every library can necessarily implement the latter for a variety of reasons, not always technical. From the pov that a library can be a community center in a time of crisis, I'm wondering not only how quickly a search engine would pick that up but also, in such moments, how prioritized updating that data would be in the first place. On Wed, May 6, 2015 at 8:22 AM, Charlie Morris cdmorri...@gmail.com wrote: I'm curious, Karen, Ethan or anyone else, do you know of any examples of libraries that have implemented schema.org or RDFa for hours data and have noticed that Google or some other search engine has picked it up (i.e., correctly displaying that data as part of the search results)? And if so, how quickly will Google or the like pickup on changes to hours (i.e., shifting between semesters or unplanned changes)? On Wed, May 6, 2015 at 8:15 AM, Ethan Gruber ewg4x...@gmail.com wrote: +1 on the RDFa and schema.org. For those that don't know the library URL off-hand, it is much easier to find a library website by Googling than it is to go through the central university portal, and the hours will show up at the top of the page after having been harvested by search engines. On Tue, May 5, 2015 at 6:54 PM, Karen Coyle li...@kcoyle.net wrote: Note that library hours is one of the possible bits of information that could be encoded as RDFa in the library web site, thus making it possible to derive library hours directly from the listing of hours on the web site rather than keeping a separate list. Schema.org does have the elements such that hours can be encoded. This would mean that hours could show in the display of the library's catalog entry on Google, Yahoo and Bing. Being available directly through the search engines might be sufficient, not necessitating creating yet-another-database for that data. Schema.org uses a restaurant as its opening hours example, but much of the data would be the same for a library: div vocab=http://schema.org/; typeof=Restaurant span property=nameGreatFood/span div property=aggregateRating typeof=AggregateRating span property=ratingValue4/span stars - based on span property=reviewCount250/span reviews /div div property=address typeof=PostalAddress span property=streetAddress1901 Lemur Ave/span span property=addressLocalitySunnyvale/span, span property=addressRegionCA/span span property=postalCode94086/span /div span property=telephone(408) 714-1489/span a property=url href=http://www.dishdash.com;www.greatfood.com /a Hours: meta property=openingHours content=Mo-Sa 11:00-14:30Mon-Sat 11am - 2:30pm meta property=openingHours content=Mo-Th 17:00-21:30Mon-Thu 5pm - 9:30pm meta property=openingHours content=Fr-Sa 17:00-22:00Fri-Sat 5pm - 10:00pm Categories: span property=servesCuisine Middle Eastern /span, span property=servesCuisine Mediterranean /span Price Range: span property=priceRange$$/span Takes Reservations: Yes /div It seems to me that using schema.org would get more bang for the buck -- it would get into the search engines and could also be aggregated into whatever database is needed. As we've seen with OCLC, having a separate listing is likely to mean that the data will be out of date. kc On 5/5/15 2:19 PM, nitin arora wrote: I can't see they distinguished between public libraries and other types on their campaign page. They say all libraries as far as I can see. So I suppose then that this is true for all libraries: Libraries offer a space anyone can enter, where money isn't exchanged, and documentation
Re: [CODE4LIB] Library Hours
Hi all, I've been experimenting with schema.org OpeningHoursSpecification, and currently Bing is scraping our hours, but Google isn't. I am using RDFa-lite and I've validated it using a linter (thanks Jason Ronallo!), so I'm scratching my head as to why our hours *still* don't show up on a google search. I suspect part of it for us might be that we're re-branding away from Stockwell-Mudd Libraries to Albion College Library, as it's much more explanatory, but neither search through Google yields a nice box with hours in it like the SFPL. If and when I figure out the problem I'd be happy to send you an update of what we did and what caused it to finally work properly. On Wed, May 6, 2015 at 10:21 AM, Karen Coyle li...@kcoyle.net wrote: Tom, Google will not tell you. The entirety of how Google search works is a trade secret. We don't know the algorithm for ranking, and we don't know what information they glean from web pages -- and they are unlikely to tell. It is a constant on the schema.org discussion list that developers want to know what Google/Bing/Yahoo/Yandex will do with specific information in the web pages, and it is a constant that the reps there reply: we cannot tell you that. The only way to find out is to code and observe. kc On 5/6/15 7:00 AM, Tom Keays wrote: I'd like to find out how and why Google is parsing this information. If you go to the the SFPL hours page (first link in the Google results), and look at the source code, this is all you find. http://sfpl.org/index.php?pg=010101 Is the ID in the DIV sufficient? It would be nice to have a set of use cases to work from. Currently, I'm generating a weekly hours box by pulling JSONP from the hours API of LibCal. I could easily output this in schema.org format (and probably will now), but can Google pick up the information from the DOM if it is delivered as JSON and transformed into HTML? div id=library-hours h2Hours/h2 table class=hours cellpadding=0 cellspacing=0 tr thSun/th thMon/th thTue/th th class=todayWed/th thThu/th thFri/th thSat/th /tr tr td12-5/td td10-6/td td9-8/td td class=today9-8/td td9-8/td td12-6/td td10-6/td /tr /table /div On Wed, May 6, 2015 at 9:47 AM, Karen Coyle li...@kcoyle.net wrote: Charlie, I don't know of any libraries that have used schema.org for their web site - perhaps others do. If it is used, it should be picked up the next time the search engines index the site. What the search engines do with schema.org is not guaranteed, but can be observed. It is not guaranteed because none of the search engines will say what they do, as that is considered a trade secret (especially from each other). However, as locations and hours are important for their commercial customers (stores, restaurants, etc.) I would expect that to be picked up as a matter of course. Note that already locations and hours for some businesses do show in the search engines, and that is for sites that are not yet using schema.org, so the engines have some way of picking that up from the HTML. The Google side-bar knowledge graph for my local libraries shows Hours https://www.google.com/search?sa=Xbiw=1299bih=561q=san+francisco+public+library+larkin+street+hoursstick=H4sIAGOovnz8BQMDgzYHnxCXfq6-gVlZhbF5sZZ0drKVfk5-cmJJZn4enGGVkV9aVBzLKeznIsHxlTMy2S10V0iJwvZlMgBPWBDOSAei=qhlKVcKWJ8b7oQS65oCQCAved=0CJgBEOgTMBA : Open today · 9:00 am – 8:00 pm javascript:void(0) but I have no idea where that comes from. kc On 5/6/15 5:22 AM, Charlie Morris wrote: I'm curious, Karen, Ethan or anyone else, do you know of any examples of libraries that have implemented schema.org or RDFa for hours data and have noticed that Google or some other search engine has picked it up (i.e., correctly displaying that data as part of the search results)? And if so, how quickly will Google or the like pickup on changes to hours (i.e., shifting between semesters or unplanned changes)? On Wed, May 6, 2015 at 8:15 AM, Ethan Gruber ewg4x...@gmail.com wrote: +1 on the RDFa and schema.org. For those that don't know the library URL off-hand, it is much easier to find a library website by Googling than it is to go through the central university portal, and the hours will show up at the top of the page after having been harvested by search engines. On Tue, May 5, 2015 at 6:54 PM, Karen Coyle li...@kcoyle.net wrote: Note that library hours is one of the possible bits of information that could be encoded as RDFa in the library web site, thus making it possible to derive library hours directly from the listing of hours on the web site rather than keeping a separate list. Schema.org does have the elements such that hours can be encoded. This would mean that hours could show in the display of the library's catalog
Re: [CODE4LIB] Help with Auto Hot Key
Hi Eddie, AutoHotkey can probably do what you want to do. I am not familiar with the Sierra interface, although I have successfully used AHK to automate workflows in a variety of applications. Here's an example of a subroutine with key commands that copy the contents of a CONTENTdm text input box: https://github.com/metaweidner/UHDL_SubjectTopical_CDM/blob/master/UHDL_SubjectTopical_CDM.ahk#L295-303 And check to see if there is was actually any text on the clipboard as a result: https://github.com/metaweidner/UHDL_SubjectTopical_CDM/blob/master/UHDL_SubjectTopical_CDM.ahk#L152-158 I'd be happy to pass along more examples. Best, Andrew Weidner ajweid...@uh.edu On Tue, May 5, 2015 at 5:00 PM, Karl Holten khol...@switchinc.org wrote: This doesn't involve AutoHotkey, but maybe it would be easier to use SQL to pull that data from the Sierra database rather than screen scraping from the Sierra application. You wouldn't need to worry about where stuff displays in the interface, just where its stored on the backend. This solution would probably be cleaner to maintain as well. Excel has ways to pull in data from external sources like SQL databases, it looks like Microsoft Publisher does too. I can't speak to how easy it would be to set that up, but hopefully it would give you a start: https://support.office.com/en-ie/article/Import-data-into-Office-Publisher--Visio-or-Word-by-using-the-Data-Connection-Wizard-65295a62-8da3-49bc-8dd8-1f77d0a05127 Anyway, that's my 2 cents on an alternative tack you might want to try. Hope that helps, Karl Holten Systems Integration Specialist SWITCH Inc 414-382-6711 -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Eddie Clem Sent: Monday, May 4, 2015 1:50 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] Help with Auto Hot Key Hi there! I'm hoping someone here is a guru at AutoHotKey! :) We have a clerk that pays our invoices in Sierra. She will write the bib number on a sticky note, as well as the list price and the locations (that each copy will go to). I want to have Sierra copy the bib number, list price, locations, and order record notes onto a receipt and then this clerk would put this receipt with the first copy of the material, rather than hand write on sticky notes all day! Since I had looked, and couldn't find a way to do this easily from Sierra, I had another brilliant idea that we could have Autohotkey copy the fields I want into a template (say, in Publisher) and have the bib number turned into a barcode, and list the other fields that we want that travel around the tech services department. This barcoded bib number would be used by catalogers to enter the bib number in the 949 for overlay in Connexion, and then again by our barcoding clerk to search by bib number in Sierra. At this point, I'm thinking that Autohotkey is my best bet. Here is my prototype of what the routing slip would look like when it's done. The Thickety 2 is a note in the order record put in by our selectors for our catalogers to add that series to the bib record. The 978... is just a placeholder for where the list price will go once we get that field added to our order records: [cid:image001.png@01D08679.A5CC5160] Here is the corresponding order record. Part of my problem for Autohotkey is that not all order records will contain a note (in field z) and the locations may be different (fewer or more) on the LOCATIONS line. I have to include the multi line, because if it's just our Main Library that's receiving the item, then the LOCATIONS at the bottom don't show up at all...just the LOCATION fixed field (under ACQ TYPE). [cid:image002.png@01D08679.A5CC5160] Any thoughts would be greatly appreciated! Thanks! Eddie Eddie Clem, MLS Cataloging Librarian ec...@khcpl.orgmailto:ec...@khcpl.org | www.KHCPL.org http://www.khcpl.org/ Kokomo-Howard County Public Library Collection Management Department 305 East Mulberry Street Kokomo, IN 46901 765.626.0853|765.450.6290 (fax)
Re: [CODE4LIB] Library Hours
When I was at the Robert M Bird Library I put some basic schema.org on the old site, but I didn't mark up the hours. That'll be a project for here as well, once I get out from under some of what I'm working on now. Best regards, *Jason Bengtson, MLIS, MA* Innovation Architect *Houston Academy of MedicineThe Texas Medical Center Library* 1133 John Freeman Blvd Houston, TX 77030 http://library.tmc.edu/ www.jasonbengtson.com On Wed, May 6, 2015 at 9:02 AM, Karen Coyle li...@kcoyle.net wrote: The search engine may not pick it up quickly enough, but the emergency services in the area could get it from the RDFa as soon as it hits the web. kc On 5/6/15 6:45 AM, nitin arora wrote: I think both creating a one-off list and schema.org approaches pose problems within the context of the original fund raising campaign's pitch. I don't think every library can necessarily implement the latter for a variety of reasons, not always technical. From the pov that a library can be a community center in a time of crisis, I'm wondering not only how quickly a search engine would pick that up but also, in such moments, how prioritized updating that data would be in the first place. On Wed, May 6, 2015 at 8:22 AM, Charlie Morris cdmorri...@gmail.com wrote: I'm curious, Karen, Ethan or anyone else, do you know of any examples of libraries that have implemented schema.org or RDFa for hours data and have noticed that Google or some other search engine has picked it up (i.e., correctly displaying that data as part of the search results)? And if so, how quickly will Google or the like pickup on changes to hours (i.e., shifting between semesters or unplanned changes)? On Wed, May 6, 2015 at 8:15 AM, Ethan Gruber ewg4x...@gmail.com wrote: +1 on the RDFa and schema.org. For those that don't know the library URL off-hand, it is much easier to find a library website by Googling than it is to go through the central university portal, and the hours will show up at the top of the page after having been harvested by search engines. On Tue, May 5, 2015 at 6:54 PM, Karen Coyle li...@kcoyle.net wrote: Note that library hours is one of the possible bits of information that could be encoded as RDFa in the library web site, thus making it possible to derive library hours directly from the listing of hours on the web site rather than keeping a separate list. Schema.org does have the elements such that hours can be encoded. This would mean that hours could show in the display of the library's catalog entry on Google, Yahoo and Bing. Being available directly through the search engines might be sufficient, not necessitating creating yet-another-database for that data. Schema.org uses a restaurant as its opening hours example, but much of the data would be the same for a library: div vocab=http://schema.org/; typeof=Restaurant span property=nameGreatFood/span div property=aggregateRating typeof=AggregateRating span property=ratingValue4/span stars - based on span property=reviewCount250/span reviews /div div property=address typeof=PostalAddress span property=streetAddress1901 Lemur Ave/span span property=addressLocalitySunnyvale/span, span property=addressRegionCA/span span property=postalCode94086/span /div span property=telephone(408) 714-1489/span a property=url href=http://www.dishdash.com;www.greatfood.com /a Hours: meta property=openingHours content=Mo-Sa 11:00-14:30Mon-Sat 11am - 2:30pm meta property=openingHours content=Mo-Th 17:00-21:30Mon-Thu 5pm - 9:30pm meta property=openingHours content=Fr-Sa 17:00-22:00Fri-Sat 5pm - 10:00pm Categories: span property=servesCuisine Middle Eastern /span, span property=servesCuisine Mediterranean /span Price Range: span property=priceRange$$/span Takes Reservations: Yes /div It seems to me that using schema.org would get more bang for the buck -- it would get into the search engines and could also be aggregated into whatever database is needed. As we've seen with OCLC, having a separate listing is likely to mean that the data will be out of date. kc On 5/5/15 2:19 PM, nitin arora wrote: I can't see they distinguished between public libraries and other types on their campaign page. They say all libraries as far as I can see. So I suppose then that this is true for all libraries: Libraries offer a space anyone can enter, where money isn't exchanged, and documentation doesn't have to be shown. Who knew fines and library/student-IDs were a thing of the past? The only data sets I can find where they got the 17,000 number is for public libraries: http://www.imls.gov/research/pls_data_files.aspx Maybe I missed something. There is an hours field on one of the CSVs I downloaded, etc for 2012 data (the most recent I could find). Asking 10k for something targeted for
Re: [CODE4LIB] Library Hours
Tom, Google will not tell you. The entirety of how Google search works is a trade secret. We don't know the algorithm for ranking, and we don't know what information they glean from web pages -- and they are unlikely to tell. It is a constant on the schema.org discussion list that developers want to know what Google/Bing/Yahoo/Yandex will do with specific information in the web pages, and it is a constant that the reps there reply: we cannot tell you that. The only way to find out is to code and observe. kc On 5/6/15 7:00 AM, Tom Keays wrote: I'd like to find out how and why Google is parsing this information. If you go to the the SFPL hours page (first link in the Google results), and look at the source code, this is all you find. http://sfpl.org/index.php?pg=010101 Is the ID in the DIV sufficient? It would be nice to have a set of use cases to work from. Currently, I'm generating a weekly hours box by pulling JSONP from the hours API of LibCal. I could easily output this in schema.org format (and probably will now), but can Google pick up the information from the DOM if it is delivered as JSON and transformed into HTML? div id=library-hours h2Hours/h2 table class=hours cellpadding=0 cellspacing=0 tr thSun/th thMon/th thTue/th th class=todayWed/th thThu/th thFri/th thSat/th /tr tr td12-5/td td10-6/td td9-8/td td class=today9-8/td td9-8/td td12-6/td td10-6/td /tr /table /div On Wed, May 6, 2015 at 9:47 AM, Karen Coyle li...@kcoyle.net wrote: Charlie, I don't know of any libraries that have used schema.org for their web site - perhaps others do. If it is used, it should be picked up the next time the search engines index the site. What the search engines do with schema.org is not guaranteed, but can be observed. It is not guaranteed because none of the search engines will say what they do, as that is considered a trade secret (especially from each other). However, as locations and hours are important for their commercial customers (stores, restaurants, etc.) I would expect that to be picked up as a matter of course. Note that already locations and hours for some businesses do show in the search engines, and that is for sites that are not yet using schema.org, so the engines have some way of picking that up from the HTML. The Google side-bar knowledge graph for my local libraries shows Hours https://www.google.com/search?sa=Xbiw=1299bih=561q=san+francisco+public+library+larkin+street+hoursstick=H4sIAGOovnz8BQMDgzYHnxCXfq6-gVlZhbF5sZZ0drKVfk5-cmJJZn4enGGVkV9aVBzLKeznIsHxlTMy2S10V0iJwvZlMgBPWBDOSAei=qhlKVcKWJ8b7oQS65oCQCAved=0CJgBEOgTMBA: Open today · 9:00 am – 8:00 pm javascript:void(0) but I have no idea where that comes from. kc On 5/6/15 5:22 AM, Charlie Morris wrote: I'm curious, Karen, Ethan or anyone else, do you know of any examples of libraries that have implemented schema.org or RDFa for hours data and have noticed that Google or some other search engine has picked it up (i.e., correctly displaying that data as part of the search results)? And if so, how quickly will Google or the like pickup on changes to hours (i.e., shifting between semesters or unplanned changes)? On Wed, May 6, 2015 at 8:15 AM, Ethan Gruber ewg4x...@gmail.com wrote: +1 on the RDFa and schema.org. For those that don't know the library URL off-hand, it is much easier to find a library website by Googling than it is to go through the central university portal, and the hours will show up at the top of the page after having been harvested by search engines. On Tue, May 5, 2015 at 6:54 PM, Karen Coyle li...@kcoyle.net wrote: Note that library hours is one of the possible bits of information that could be encoded as RDFa in the library web site, thus making it possible to derive library hours directly from the listing of hours on the web site rather than keeping a separate list. Schema.org does have the elements such that hours can be encoded. This would mean that hours could show in the display of the library's catalog entry on Google, Yahoo and Bing. Being available directly through the search engines might be sufficient, not necessitating creating yet-another-database for that data. Schema.org uses a restaurant as its opening hours example, but much of the data would be the same for a library: div vocab=http://schema.org/; typeof=Restaurant span property=nameGreatFood/span div property=aggregateRating typeof=AggregateRating span property=ratingValue4/span stars - based on span property=reviewCount250/span reviews /div div property=address typeof=PostalAddress span property=streetAddress1901 Lemur Ave/span span property=addressLocalitySunnyvale/span, span property=addressRegionCA/span span property=postalCode94086/span /div span property=telephone(408) 714-1489/span a property=url
[CODE4LIB] DLF Preconference for Liberal Arts Colleges - Call For Proposals
The Digital Library Federation is hosting our inaugural DLF Liberal Arts Colleges Preconference on October 25th in Vancouver, BC, preceding this year's DLF Forum. The one-day preconference will be an opportunity for those working with digital libraries/digital scholarship in liberal arts colleges to work closely together, in the spirit of the liberal arts seminar, to consider the issues and opportunities unique to us. We invite proposals for panels, presentations, or working sessions that foster conversation, connections, and provocation at the intersection of digital libraries and the liberal arts. How does your project or approach take advantage of the liberal arts environment, or respond to its limitations? How is your work informed by the values of a liberal arts college? What is the role of liberal arts college institutions in the digital library/digital scholarship world? Session Types * Full Panel: Multiple presenters centered on a theme, in the format of your choice. (60 minutes) * Presentation: Single or multiple presenters, covering specific topics or case studies. (20 minutes) * Working session: An interactive session involving hands-on learning and collaboration. Single or multiple presenters. (30 minutes or 60 minutes) Complete proposals should be submitted using the online submission form[1] by 5:00 PM EST on June 22, 2015. Proposals must include a title, session type, information for each presenter (name, institution, and email), proposal description (maximum 300 words), and proposal abstract (maximum 100 words). You will hear about your proposal status by mid-August. The 2015 DLF Liberal Arts Colleges Preconference[2] will be held October 25 in Vancouver, BC, at the Pinnacle Vancouver Harbourfront Hotel. The 2015 DLF Forum will be held October 26-28, and the Forum call for proposals[3] is also open until June 22. [1] Proposal submission form: https://docs.google.com/forms/d/1rn3OuC38aZ4hplvkMMJsQvsLqPd2Dv3wtU1TzMGp4xQ/viewform?c=0w=1 [2] Preconference description: http://www.diglib.org/forums/2015forum/affiliated-events/dlflac [3] DLF Forum CfP: http://www.diglib.org/forums/2015forum/cfp/
Re: [CODE4LIB] How to measure quality of a record
I'll second Bob's recommendation on that paper. I've found the following paper to be an interesting read on the topic of metadata quality and some of the ways that we could approach measuring it with automation. Automatic Evaluation of Metadata Quality in Digital Repositories by Xavier Ochoa and Erik Duval https://lirias.kuleuven.be/bitstream/123456789/255807/2/xavuxavier-pre.pdf Mark From: Code for Libraries CODE4LIB@LISTSERV.ND.EDU on behalf of Robert Sandusky sandu...@uic.edu Sent: Wednesday, May 6, 2015 4:42 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] How to measure quality of a record I recommend this article as an entry point into a research program on information quality: Stvilia, B., Gasser, L., Twidale, M. B. and Smith, L. C. (2007), A framework for information quality assessment. J. Am. Soc. Inf. Sci., 58: 1720–1733. doi:10.1002/asi.20652 Available at: http://stvilia.cci.fsu.edu/wp-content/uploads/2011/03/IQAssessmentFramework.pdf One cannot manage information quality (IQ) without first being able to measure it meaningfully and establishing a causal connection between the source of IQ change, the IQ problem types, the types of activities affected, and their implications. In this article we propose a general IQ assessment framework. In contrast to context-specific IQ assessment models, which usually focus on a few variables determined by local needs, our framework consists of comprehensive typologies of IQ problems, related activities, and a taxonomy of IQ dimensions organized in a systematic way based on sound theories and practices. The framework can be used as a knowledge resource and as a guide for developing IQ measurement models for many different settings. The framework was validated and refined by developing specific IQ measurement models for two large-scale collections of two large classes of information objects: Simple Dublin Core records and online encyclopedia articles. Bob On 5/6/2015 4:32 PM, Diane Hillmann wrote: You might try this blog post, by Thomas Bruce, who was my co-author on an earlier article (referred to in the post): https://blog.law.cornell.edu/voxpop/2013/01/24/metadata-quality-in-a-linked-data-context/ Diane On Wed, May 6, 2015 at 5:24 PM, Kyle Banerjee kyle.baner...@gmail.com wrote: On May 6, 2015, at 7:08 AM, James Morley james.mor...@europeana.eu wrote: I think a key thing is to determine to what extent any definition of 'completeness' is actually a representation of 'quality'. As Peter says, making sure not just that metadata is present but then checking it conforms with rules is a big step towards this. This. Basing quality measures too much on the presence of certain data points or the volume of data is fraught with peril. In experiments in the distant past, my experience was that looking for structure and syntax patterns that indicate good/bad quality as well as considering record sources was useful. Also keep in mind that any scoring system is to some extent arbitrary, so you don't want to read more into what it generates than appropriate. Kyle
Re: [CODE4LIB] How to measure quality of a record
Here in .nz the national library runs a local aggregation service http://digitalnz.org/ which has quite good penetration into schools and so forth. It provides some metadata quality reports such as http://metadata.digitalnz.org/nzresearch/127 for sources it aggregates (that report is actually quite a bit dated). My experience of these reports is they're useful in inverse proportion to the diversity of the collection being reported on. The narrower your collection, the more real issues are going to be caught. cheers stuart -- ...let us be heard from red core to black sky
Re: [CODE4LIB] Help with Auto Hot Key
Hi Eddie, I'm not an autohotkey guru, but I just wanted to mention that when you are invoicing in Sierra, you do have the option to print the bib/order record for the item you are invoicing. I believe this would provide all of the information you are looking for. Of course, it will also provide the entire bib, which may not be what you are looking for, but it is not unusual to include this printout inside the book upon receipt/invoicing. Good luck, Dawn -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Eddie Clem Sent: Monday, May 04, 2015 2:50 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] Help with Auto Hot Key Hi there! I'm hoping someone here is a guru at AutoHotKey! :) We have a clerk that pays our invoices in Sierra. She will write the bib number on a sticky note, as well as the list price and the locations (that each copy will go to). I want to have Sierra copy the bib number, list price, locations, and order record notes onto a receipt and then this clerk would put this receipt with the first copy of the material, rather than hand write on sticky notes all day! Since I had looked, and couldn't find a way to do this easily from Sierra, I had another brilliant idea that we could have Autohotkey copy the fields I want into a template (say, in Publisher) and have the bib number turned into a barcode, and list the other fields that we want that travel around the tech services department. This barcoded bib number would be used by catalogers to enter the bib number in the 949 for overlay in Connexion, and then again by our barcoding clerk to search by bib number in Sierra. At this point, I'm thinking that Autohotkey is my best bet. Here is my prototype of what the routing slip would look like when it's done. The Thickety 2 is a note in the order record put in by our selectors for our catalogers to add that series to the bib record. The 978... is just a placeholder for where the list price will go once we get that field added to our order records: [cid:image001.png@01D08679.A5CC5160] Here is the corresponding order record. Part of my problem for Autohotkey is that not all order records will contain a note (in field z) and the locations may be different (fewer or more) on the LOCATIONS line. I have to include the multi line, because if it's just our Main Library that's receiving the item, then the LOCATIONS at the bottom don't show up at all...just the LOCATION fixed field (under ACQ TYPE). [cid:image002.png@01D08679.A5CC5160] Any thoughts would be greatly appreciated! Thanks! Eddie Eddie Clem, MLS Cataloging Librarian ec...@khcpl.orgmailto:ec...@khcpl.org | www.KHCPL.orghttp://www.khcpl.org/ Kokomo-Howard County Public Library Collection Management Department 305 East Mulberry Street Kokomo, IN 46901 765.626.0853|765.450.6290 (fax)
Re: [CODE4LIB] Help with Auto Hot Key
This afternoon, I tried several different methods to print the order record (and order bib) onto receipt paper. That works well--except that it cuts off part of the of the order record note toward the bottom. (we'd prefer to use receipt paper rather than regular computer paper--it's much faster to print and auto-cuts!) Otherwise, I think it would work for my project. When I tried to make the text smaller (from 8 to 6 or 7), it made the font too light and it wasn't readable. Bummer! We're so close!! Eddie -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Dawn Romano Sent: Wednesday, May 6, 2015 4:08 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Help with Auto Hot Key Hi Eddie, I'm not an autohotkey guru, but I just wanted to mention that when you are invoicing in Sierra, you do have the option to print the bib/order record for the item you are invoicing. I believe this would provide all of the information you are looking for. Of course, it will also provide the entire bib, which may not be what you are looking for, but it is not unusual to include this printout inside the book upon receipt/invoicing. Good luck, Dawn -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Eddie Clem Sent: Monday, May 04, 2015 2:50 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] Help with Auto Hot Key Hi there! I'm hoping someone here is a guru at AutoHotKey! :) We have a clerk that pays our invoices in Sierra. She will write the bib number on a sticky note, as well as the list price and the locations (that each copy will go to). I want to have Sierra copy the bib number, list price, locations, and order record notes onto a receipt and then this clerk would put this receipt with the first copy of the material, rather than hand write on sticky notes all day! Since I had looked, and couldn't find a way to do this easily from Sierra, I had another brilliant idea that we could have Autohotkey copy the fields I want into a template (say, in Publisher) and have the bib number turned into a barcode, and list the other fields that we want that travel around the tech services department. This barcoded bib number would be used by catalogers to enter the bib number in the 949 for overlay in Connexion, and then again by our barcoding clerk to search by bib number in Sierra. At this point, I'm thinking that Autohotkey is my best bet. Here is my prototype of what the routing slip would look like when it's done. The Thickety 2 is a note in the order record put in by our selectors for our catalogers to add that series to the bib record. The 978... is just a placeholder for where the list price will go once we get that field added to our order records: [cid:image001.png@01D08679.A5CC5160] Here is the corresponding order record. Part of my problem for Autohotkey is that not all order records will contain a note (in field z) and the locations may be different (fewer or more) on the LOCATIONS line. I have to include the multi line, because if it's just our Main Library that's receiving the item, then the LOCATIONS at the bottom don't show up at all...just the LOCATION fixed field (under ACQ TYPE). [cid:image002.png@01D08679.A5CC5160] Any thoughts would be greatly appreciated! Thanks! Eddie Eddie Clem, MLS Cataloging Librarian ec...@khcpl.orgmailto:ec...@khcpl.org | www.KHCPL.orghttp://www.khcpl.org/ Kokomo-Howard County Public Library Collection Management Department 305 East Mulberry Street Kokomo, IN 46901 765.626.0853|765.450.6290 (fax)
Re: [CODE4LIB] Library Hours
On Wed, May 6, 2015 at 8:15 AM, Ethan Gruber ewg4x...@gmail.com wrote: +1 on the RDFa and schema.org. For those that don't know the library URL off-hand, it is much easier to find a library website by Googling than it is to go through the central university portal, and the hours will show up at the top of the page after having been harvested by search engines. Hi, so this is an area that I've done, and am doing, a fair bit of work. See http://stuff.coffeecode.net/2015/ola_white_hat_seo/#/1/10 for some fun slides from a presentation I gave in January at the Ontario Library Association SuperConference that show some ways data gets into Google/Yahoo/Bing and concludes that the OCLC Registry manually maintain yet another copy of your data elsewhere approach isn't working. (Hit s to get speaker notes). The rest of the presentation goes into depth on how to use RDFa to mark up a real library web page with location, contact info, opening hours, and event info. And I've posited that crawling library sites to pull single-sourced data (e.g. you update your website to provide updated hours to humans, and the machines automatically benefit) would be a much more effective, accurate, and usable approach than maintaining copies of the data in Google+, OCLC Registry, etc. We could produce results like http://cwrc.ca/rsc-src/ that stay accurate, rather than being one-off efforts that decay over time. (It would be great if the OCLC Registry had a crawl this URL option so that it could keep all of its data up-to-date and incentive libraries to publish the data in a machine-readable format such as RDFa + schema.org.) On the but that's technically challenging front, I tried pursuing some grant funding to produce templates for publishing that structured info in Drupal, Joomla, and other commonly used CMSs. Sadly, my application was recently denied, but that will only slow me down; I'm not going to give up on the goal. I have a paper in the works that will expand on the content of the presentation for those sites that have the ability (technical and administrative) to modify their own web pages. Sites running the Evergreen library system already generate a page for each of their libraries that contains this structured data (e.g. https://laurentian.concat.ca/eg/opac/library/OSUL), which is single sourced from the data that has to be maintained in the library system anyway. I'll happily acknowledge that getting search engines to harvest the right data is not easy, though: right now, for example, if you search for J.N. Desmarais Library it currently shows that the library is open 24 hours a day, which is completely false--probably maliciously submitted--information. *sigh* I've edited that info in the Google+ page at https://plus.google.com/+JNDesmaraisLibraryGreaterSudbury but even though it is a verified place and I am a manager of the G+ page, the edits still go through approval by Googlers. There appears to be no good way to tell Google Hey, *this* is the URL you are looking for!. Somewhat amusingly, the entire reason I started working with schema.org dates back to an presentation I attended about Google Places years ago, where I whined about having to maintain yet another copy of data in yet another place, and the response inferred that schema.org might be the solution to that problem. Also, due to the structure of university web property ownership, we currently don't have the ability to modify our actual library home page to include any RDFa, which is a *wee* bit frustrating given my work in the field. Heh. Dan Scott Laurentian University
Re: [CODE4LIB] How to measure quality of a record
I recommend this article as an entry point into a research program on information quality: Stvilia, B., Gasser, L., Twidale, M. B. and Smith, L. C. (2007), A framework for information quality assessment. J. Am. Soc. Inf. Sci., 58: 1720–1733. doi:10.1002/asi.20652 Available at: http://stvilia.cci.fsu.edu/wp-content/uploads/2011/03/IQAssessmentFramework.pdf One cannot manage information quality (IQ) without first being able to measure it meaningfully and establishing a causal connection between the source of IQ change, the IQ problem types, the types of activities affected, and their implications. In this article we propose a general IQ assessment framework. In contrast to context-specific IQ assessment models, which usually focus on a few variables determined by local needs, our framework consists of comprehensive typologies of IQ problems, related activities, and a taxonomy of IQ dimensions organized in a systematic way based on sound theories and practices. The framework can be used as a knowledge resource and as a guide for developing IQ measurement models for many different settings. The framework was validated and refined by developing specific IQ measurement models for two large-scale collections of two large classes of information objects: Simple Dublin Core records and online encyclopedia articles. Bob On 5/6/2015 4:32 PM, Diane Hillmann wrote: You might try this blog post, by Thomas Bruce, who was my co-author on an earlier article (referred to in the post): https://blog.law.cornell.edu/voxpop/2013/01/24/metadata-quality-in-a-linked-data-context/ Diane On Wed, May 6, 2015 at 5:24 PM, Kyle Banerjee kyle.baner...@gmail.com wrote: On May 6, 2015, at 7:08 AM, James Morley james.mor...@europeana.eu wrote: I think a key thing is to determine to what extent any definition of 'completeness' is actually a representation of 'quality'. As Peter says, making sure not just that metadata is present but then checking it conforms with rules is a big step towards this. This. Basing quality measures too much on the presence of certain data points or the volume of data is fraught with peril. In experiments in the distant past, my experience was that looking for structure and syntax patterns that indicate good/bad quality as well as considering record sources was useful. Also keep in mind that any scoring system is to some extent arbitrary, so you don't want to read more into what it generates than appropriate. Kyle
Re: [CODE4LIB] How to measure quality of a record
On May 6, 2015, at 7:08 AM, James Morley james.mor...@europeana.eu wrote: I think a key thing is to determine to what extent any definition of 'completeness' is actually a representation of 'quality'. As Peter says, making sure not just that metadata is present but then checking it conforms with rules is a big step towards this. This. Basing quality measures too much on the presence of certain data points or the volume of data is fraught with peril. In experiments in the distant past, my experience was that looking for structure and syntax patterns that indicate good/bad quality as well as considering record sources was useful. Also keep in mind that any scoring system is to some extent arbitrary, so you don't want to read more into what it generates than appropriate. Kyle
Re: [CODE4LIB] How to measure quality of a record
You might try this blog post, by Thomas Bruce, who was my co-author on an earlier article (referred to in the post): https://blog.law.cornell.edu/voxpop/2013/01/24/metadata-quality-in-a-linked-data-context/ Diane On Wed, May 6, 2015 at 5:24 PM, Kyle Banerjee kyle.baner...@gmail.com wrote: On May 6, 2015, at 7:08 AM, James Morley james.mor...@europeana.eu wrote: I think a key thing is to determine to what extent any definition of 'completeness' is actually a representation of 'quality'. As Peter says, making sure not just that metadata is present but then checking it conforms with rules is a big step towards this. This. Basing quality measures too much on the presence of certain data points or the volume of data is fraught with peril. In experiments in the distant past, my experience was that looking for structure and syntax patterns that indicate good/bad quality as well as considering record sources was useful. Also keep in mind that any scoring system is to some extent arbitrary, so you don't want to read more into what it generates than appropriate. Kyle
Re: [CODE4LIB] Library Hours
Salvete! Google often draws data from OpenStreetMap. If one wanted to, one could simply edit the Library information there and watch it get picked up rather quickly. http://wiki.openstreetmap.org/wiki/Tag:amenity%3Dlibrary #justsayin Brooke
Re: [CODE4LIB] Library Hours
I don't know if this could give it a nudge (because as discussed, nobody knows how they work), but you can go into Google Maps (or https://www.google.com/business/ ) and find your place, and claim it with a Google account (you will have to be verified somehow, IIRC usually they will call the contact phone number with a code or something). This lets you put in lots of information that definitely *does* influence the Google results display, often with a card showing location, photo(s), hours, phone number(s), etc. I put ours in some time ago by hand, and it looks like it has updated to our latest regular hours (which have changed since I would have put them in back then). If you enter your hours this way, they will show up in a day or two. You will get a nice looking card in search results. You get Insights and other reports telling you how many times people searched your site, asked for directions, clicked the phone number, etc. And maybe, just maybe, their algorithm will compare that data with data from your site to match it up to automatically update in the future. They're definitely doing some kind of heuristic or guesswork parsing, since when it finds an update (as it did with our hours data, it does ask you to review and verify. Steven -- /Steven Pryor Director of Digital Initiatives and Technologies Assistant Professor Library and Information Services Southern Illinois University Edwardsville (618) 650-3080 stpr...@siue.edu / On 5/6/2015 9:33 AM, Megan O'Neill Kudzia wrote: Hi all, I've been experimenting with schema.org OpeningHoursSpecification, and currently Bing is scraping our hours, but Google isn't. I am using RDFa-lite and I've validated it using a linter (thanks Jason Ronallo!), so I'm scratching my head as to why our hours *still* don't show up on a google search. I suspect part of it for us might be that we're re-branding away from Stockwell-Mudd Libraries to Albion College Library, as it's much more explanatory, but neither search through Google yields a nice box with hours in it like the SFPL. If and when I figure out the problem I'd be happy to send you an update of what we did and what caused it to finally work properly. On Wed, May 6, 2015 at 10:21 AM, Karen Coyle li...@kcoyle.net wrote: Tom, Google will not tell you. The entirety of how Google search works is a trade secret. We don't know the algorithm for ranking, and we don't know what information they glean from web pages -- and they are unlikely to tell. It is a constant on the schema.org discussion list that developers want to know what Google/Bing/Yahoo/Yandex will do with specific information in the web pages, and it is a constant that the reps there reply: we cannot tell you that. The only way to find out is to code and observe. kc On 5/6/15 7:00 AM, Tom Keays wrote: I'd like to find out how and why Google is parsing this information. If you go to the the SFPL hours page (first link in the Google results), and look at the source code, this is all you find. http://sfpl.org/index.php?pg=010101 Is the ID in the DIV sufficient? It would be nice to have a set of use cases to work from. Currently, I'm generating a weekly hours box by pulling JSONP from the hours API of LibCal. I could easily output this in schema.org format (and probably will now), but can Google pick up the information from the DOM if it is delivered as JSON and transformed into HTML? div id=library-hours h2Hours/h2 table class=hours cellpadding=0 cellspacing=0 tr thSun/th thMon/th thTue/th th class=todayWed/th thThu/th thFri/th thSat/th /tr tr td12-5/td td10-6/td td9-8/td td class=today9-8/td td9-8/td td12-6/td td10-6/td /tr /table /div On Wed, May 6, 2015 at 9:47 AM, Karen Coyle li...@kcoyle.net wrote: Charlie, I don't know of any libraries that have used schema.org for their web site - perhaps others do. If it is used, it should be picked up the next time the search engines index the site. What the search engines do with schema.org is not guaranteed, but can be observed. It is not guaranteed because none of the search engines will say what they do, as that is considered a trade secret (especially from each other). However, as locations and hours are important for their commercial customers (stores, restaurants, etc.) I would expect that to be picked up as a matter of course. Note that already locations and hours for some businesses do show in the search engines, and that is for sites that are not yet using schema.org, so the engines have some way of picking that up from the HTML. The Google side-bar knowledge graph for my local libraries shows Hours
Re: [CODE4LIB] Library Hours
Yes, it definitely does. Which actually is a problem for Wikipedia because it encourages people/companies to try to get entries into WP for SEO purposes and so that the sidebox will show up. I spend a lot of time on the articles for deletion pages of WP trying to get these promotional pages out of the encyclopedia. A big success is when I see them disappear from search results. (BTW, the various ways that self-published authors of written crap game the system is truly astonishing. A+ for effort, and their skill in PR is way beyond their literary skills.) kc On 5/6/15 8:33 AM, Bigwood, David wrote: I have heard that at least part of the sidebox is constructed using data from Wikipedia, especially the structured info in the infobox there. Dave -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Karen Coyle Sent: Wednesday, May 06, 2015 9:21 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Library Hours Tom, Google will not tell you. The entirety of how Google search works is a trade secret. We don't know the algorithm for ranking, and we don't know what information they glean from web pages -- and they are unlikely to tell. It is a constant on the schema.org discussion list that developers want to know what Google/Bing/Yahoo/Yandex will do with specific information in the web pages, and it is a constant that the reps there reply: we cannot tell you that. The only way to find out is to code and observe. kc -- Karen Coyle kco...@kcoyle.net http://kcoyle.net m: +1-510-435-8234 skype: kcoylenet/+1-510-984-3600
Re: [CODE4LIB] How to measure quality of a record
Sergio, I'm hoping the conversations and interest around #metadataquality hashtag: https://twitter.com/hashtag/metadataquality help to move forward some of these conversations from well constructed research projects and academic papers to something that more of us can implement locally in our systems. There are many different ways that we could look at some of these problems and I think having more of us sharing our ideas and possibly code will be great. Mark From: Code for Libraries CODE4LIB@LISTSERV.ND.EDU on behalf of Sergio Letuche code4libus...@gmail.com Sent: Wednesday, May 6, 2015 7:20 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] How to measure quality of a record i felt i was missing something, since i could not find some general, most used approach, and perhaps some code on github that implements these quality measures... 2015-05-06 15:08 GMT+03:00 James Morley james.mor...@europeana.eu: I think a key thing is to determine to what extent any definition of 'completeness' is actually a representation of 'quality'. As Peter says, making sure not just that metadata is present but then checking it conforms with rules is a big step towards this. I would also extend this to assessing at what level of accuracy things have been set, for example dates (a rough range vs a precise day) and geotags (coordinates presenting the centre of Paris vs the exact position that a photograph was taken from). These sorts of things can make a big difference to both the discoverability and practical reusability of records by end users. Best, James From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] on behalf of Esmé Cowles [escow...@ticklefish.org] Sent: 06 May 2015 13:51 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] How to measure quality of a record Sergio- Mark Phillips has a related blog post that I think is an excellent place to start, which outlines a system for scoring how complete a record is: http://vphill.com/journal/post/4075 There was some discussion on twitter recently about this, which you can look up on the #metadataquality hashtag: https://twitter.com/hashtag/metadataquality I think there was a move to setup a mailing list for this topic or something like that, but I'm not sure where that stands now. -Esme On 05/06/15, at 7:21 AM, Sergio Letuche code4libus...@gmail.com wrote: Hello community, is there a way, any statistical approach, that you are aware of that let's say, allows one to have an idea of how complete a record is, or what are the actions you take in order to have an idea of the quality of a record, and eventually a database? Thank you in advance
Re: [CODE4LIB] Library Hours
I generally find that Bing makes better use of RDFa/schema.org than Google does. kc On 5/6/15 7:33 AM, Megan O'Neill Kudzia wrote: Hi all, I've been experimenting with schema.org OpeningHoursSpecification, and currently Bing is scraping our hours, but Google isn't. I am using RDFa-lite and I've validated it using a linter (thanks Jason Ronallo!), so I'm scratching my head as to why our hours *still* don't show up on a google search. I suspect part of it for us might be that we're re-branding away from Stockwell-Mudd Libraries to Albion College Library, as it's much more explanatory, but neither search through Google yields a nice box with hours in it like the SFPL. If and when I figure out the problem I'd be happy to send you an update of what we did and what caused it to finally work properly. On Wed, May 6, 2015 at 10:21 AM, Karen Coyle li...@kcoyle.net wrote: Tom, Google will not tell you. The entirety of how Google search works is a trade secret. We don't know the algorithm for ranking, and we don't know what information they glean from web pages -- and they are unlikely to tell. It is a constant on the schema.org discussion list that developers want to know what Google/Bing/Yahoo/Yandex will do with specific information in the web pages, and it is a constant that the reps there reply: we cannot tell you that. The only way to find out is to code and observe. kc On 5/6/15 7:00 AM, Tom Keays wrote: I'd like to find out how and why Google is parsing this information. If you go to the the SFPL hours page (first link in the Google results), and look at the source code, this is all you find. http://sfpl.org/index.php?pg=010101 Is the ID in the DIV sufficient? It would be nice to have a set of use cases to work from. Currently, I'm generating a weekly hours box by pulling JSONP from the hours API of LibCal. I could easily output this in schema.org format (and probably will now), but can Google pick up the information from the DOM if it is delivered as JSON and transformed into HTML? div id=library-hours h2Hours/h2 table class=hours cellpadding=0 cellspacing=0 tr thSun/th thMon/th thTue/th th class=todayWed/th thThu/th thFri/th thSat/th /tr tr td12-5/td td10-6/td td9-8/td td class=today9-8/td td9-8/td td12-6/td td10-6/td /tr /table /div On Wed, May 6, 2015 at 9:47 AM, Karen Coyle li...@kcoyle.net wrote: Charlie, I don't know of any libraries that have used schema.org for their web site - perhaps others do. If it is used, it should be picked up the next time the search engines index the site. What the search engines do with schema.org is not guaranteed, but can be observed. It is not guaranteed because none of the search engines will say what they do, as that is considered a trade secret (especially from each other). However, as locations and hours are important for their commercial customers (stores, restaurants, etc.) I would expect that to be picked up as a matter of course. Note that already locations and hours for some businesses do show in the search engines, and that is for sites that are not yet using schema.org, so the engines have some way of picking that up from the HTML. The Google side-bar knowledge graph for my local libraries shows Hours https://www.google.com/search?sa=Xbiw=1299bih=561q=san+francisco+public+library+larkin+street+hoursstick=H4sIAGOovnz8BQMDgzYHnxCXfq6-gVlZhbF5sZZ0drKVfk5-cmJJZn4enGGVkV9aVBzLKeznIsHxlTMy2S10V0iJwvZlMgBPWBDOSAei=qhlKVcKWJ8b7oQS65oCQCAved=0CJgBEOgTMBA : Open today · 9:00 am – 8:00 pm javascript:void(0) but I have no idea where that comes from. kc On 5/6/15 5:22 AM, Charlie Morris wrote: I'm curious, Karen, Ethan or anyone else, do you know of any examples of libraries that have implemented schema.org or RDFa for hours data and have noticed that Google or some other search engine has picked it up (i.e., correctly displaying that data as part of the search results)? And if so, how quickly will Google or the like pickup on changes to hours (i.e., shifting between semesters or unplanned changes)? On Wed, May 6, 2015 at 8:15 AM, Ethan Gruber ewg4x...@gmail.com wrote: +1 on the RDFa and schema.org. For those that don't know the library URL off-hand, it is much easier to find a library website by Googling than it is to go through the central university portal, and the hours will show up at the top of the page after having been harvested by search engines. On Tue, May 5, 2015 at 6:54 PM, Karen Coyle li...@kcoyle.net wrote: Note that library hours is one of the possible bits of information that could be encoded as RDFa in the library web site, thus making it possible to derive library hours directly from the listing of hours on the web site rather than keeping a separate list. Schema.org does have the elements such that hours can be encoded. This