Thanks for the tips. I got it working with an XSLT stylesheet, which I have attached for those who are interested.
You can generate a test xml file with the command line: java -jar /path/to/saxon9.jar -s http://id.loc.gov/authorities/feed/page/1/-xsl:/path/to/update-lcsh.xsl date=2010-04-15 > test.xml where date=2010-04-15 is a parameter that you can change. It is passed to the stylesheet and Saxon steps through the pages of the feed, extracting those entries created after 2010-04-14. I found it to be pretty fast. I can easily integrate this into an Orbeon pipeline to keep my Solr index of LCSH terms up to date. Ethan On Fri, May 14, 2010 at 10:45 AM, Kevin Ford <k...@loc.gov> wrote: > Hard-coded. There's currently no way to pass a type of "count" parameter. > > Cordially, > Kevin > > > >>> Ethan Gruber <ewg4x...@gmail.com> 05/14/10 9:58 AM >>> > Thanks for the help. It should be doable. Do you know if it's possible to > control the number of entries per page, or is that locked? > > Ethan > > On Thu, May 13, 2010 at 6:11 PM, Ed Summers <e...@pobox.com> wrote: > > > As Kevin said, I think you can use the Atom feed to page backwards > > through time. Basically this amounts to programatically following the > > <l!nk rel="next"> links in the feed, applying creates, updates and > > deletes as you go until you make it to Feb. 15, 2010. > > > > Currently this would involve walking from: > > > > http://id.loc.gov/authorities/feed/ > > > > to: > > > > http://id.loc.gov/authorities/feed/page/2/ > > > > all the way to: > > > > http://id.loc.gov/authorities/feed/page/96/ > > > > Then in a months time or whatever you can run the same process again. > > I think you can either walk through the feed pages until a known last > > harvest date, or until you see a record with an atom:id and > > atom:update you already know about. I think the latter could be a bit > > simpler, assuming you are keeping track of what you have. > > > > Ever since reading the OAI-ORE specs on Atom [1] I've become a bit > > taken with the idea of using Atom syndication as a drop in replacement > > for OAI-PMH--which is the spec that most people in the library > > community reach for when they want to do metadata synchronization. The > > advantage of Atom is that it fits into the syndication world so > > nicely, and its ecosystem of tools and services. > > > > //Ed > > > > [1] http://www.openarchives.org/ore/1.0/atom > > > > > > On Thu, May 13, 2010 at 4:53 PM, Kevin Ford <k...@loc.gov> wrote: > > > The short answer to your question is "no," there's no way to query > terms > > based on last modification date. However, and this feature needs > > publication on the website, there is an Atom feed that exposes the change > > activities for the subject headings: > > > > > > http://id.loc.gov/authorities/feed/ > > > > > > You can page through it (feed/page/1, feed/page/2). > > > > > > There is also a page that shows when each load was performed: > > > > > > http://id.loc.gov/authorities/loads/ > > > > > > It too has an Atom feed (http://id.loc.gov/authorities/loads/feed). > > > > > > HTH, > > > Kevin > > >
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:atom="xmlns='http://www.w3.org/2005/Atom'" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:at="http://purl.org/atompub/tombstones/1.0" exclude-result-prefixes="xs dcterms at atom" version="2.0"> <xsl:output method="xml" encoding="UTF-8"/> <xsl:param name="date"/> <xsl:param name="year" select="substring-before($date, '-')"/> <xsl:param name="month" select="substring-before(substring-after($date, '-'), '-')"/> <xsl:param name="day" select="substring-after(substring-after($date, '-'), '-')"/> <xsl:template match="/"> <add> <xsl:apply-templates select="//node()[local-name() = 'entry']"/> </add> </xsl:template> <xsl:template match="*[local-name() = 'entry']"> <xsl:variable name="local-date" select="substring-before(node()[local-name() = 'updated'], 'T')"/> <xsl:variable name="local-year" select="substring-before($local-date, '-')"/> <xsl:variable name="local-month" select="substring-before(substring-after($local-date, '-'), '-')"/> <xsl:variable name="local-day" select="substring-after(substring-after($local-date, '-'), '-')"/> <xsl:if test="$local-year >= $year"> <xsl:if test="$local-month >= $month"> <xsl:if test="$local-day >= $day"> <doc> <field name="id"> <xsl:value-of select="substring-after(node()[local-name() = 'id'], 'authorities/')"/> </field> <field name="subject"> <xsl:value-of select="node()[local-name() = 'title']"/> </field> <field name="created"> <xsl:value-of select="dcterms:created"/> </field> <field name="modified"> <xsl:value-of select="node()[local-name() = 'updated']"/> </field> </doc> <xsl:if test="position() = last()"> <xsl:variable name="next" select="//node()[local-name() = 'link']...@rel='next']/@href"/> <xsl:apply-templates select="document($next)//node()[local-name() = 'entry']"/> </xsl:if> </xsl:if> </xsl:if> </xsl:if> </xsl:template> </xsl:stylesheet>