Thanks for the tips.  I got it working with an XSLT stylesheet, which I have
attached for those who are interested.

You can generate a test xml file with the command line:

java -jar /path/to/saxon9.jar -s
http://id.loc.gov/authorities/feed/page/1/-xsl:/path/to/update-lcsh.xsl
date=2010-04-15 > test.xml

where date=2010-04-15 is a parameter that you can change.  It is passed to
the stylesheet and Saxon steps through the pages of the feed, extracting
those entries created after 2010-04-14.  I found it to be pretty fast.  I
can easily integrate this into an Orbeon pipeline to keep my Solr index of
LCSH terms up to date.

Ethan

On Fri, May 14, 2010 at 10:45 AM, Kevin Ford <k...@loc.gov> wrote:

> Hard-coded.  There's currently no way to pass a type of "count" parameter.
>
> Cordially,
> Kevin
>
>
> >>> Ethan Gruber <ewg4x...@gmail.com> 05/14/10 9:58 AM >>>
> Thanks for the help.  It should be doable.  Do you know if it's possible to
> control the number of entries per page, or is that locked?
>
> Ethan
>
> On Thu, May 13, 2010 at 6:11 PM, Ed Summers <e...@pobox.com> wrote:
>
> > As Kevin said, I think you can use the Atom feed to page backwards
> > through time. Basically this amounts to programatically following the
> > <l!nk rel="next"> links in the feed, applying creates, updates and
> > deletes as you go until you make it to Feb. 15, 2010.
> >
> > Currently this would involve walking from:
> >
> >  http://id.loc.gov/authorities/feed/
> >
> > to:
> >
> >  http://id.loc.gov/authorities/feed/page/2/
> >
> > all the way to:
> >
> >  http://id.loc.gov/authorities/feed/page/96/
> >
> > Then in a months time or whatever you can run the same process again.
> > I think you can either walk through the feed pages until a known last
> > harvest date, or until you see a record with an atom:id and
> > atom:update you already know about. I think the latter could be a bit
> > simpler, assuming you are keeping track of what you have.
> >
> > Ever since reading the OAI-ORE specs on Atom [1] I've become a bit
> > taken with the idea of using Atom syndication as a drop in replacement
> > for OAI-PMH--which is the spec that most people in the library
> > community reach for when they want to do metadata synchronization. The
> > advantage of Atom is that it fits into the syndication world so
> > nicely, and its ecosystem of tools and services.
> >
> > //Ed
> >
> > [1] http://www.openarchives.org/ore/1.0/atom
> >
> >
> > On Thu, May 13, 2010 at 4:53 PM, Kevin Ford <k...@loc.gov> wrote:
> > > The short answer to your question is "no," there's no way to query
> terms
> > based on last modification date.  However, and this feature needs
> > publication on the website, there is an Atom feed that exposes the change
> > activities for the subject headings:
> > >
> > > http://id.loc.gov/authorities/feed/
> > >
> > > You can page through it (feed/page/1, feed/page/2).
> > >
> > > There is also a page that shows when each load was performed:
> > >
> > > http://id.loc.gov/authorities/loads/
> > >
> > > It too has an Atom feed (http://id.loc.gov/authorities/loads/feed).
> > >
> > > HTH,
> > > Kevin
> >
>
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
	xmlns:atom="xmlns='http://www.w3.org/2005/Atom'" xmlns:xs="http://www.w3.org/2001/XMLSchema";
	xmlns:dcterms="http://purl.org/dc/terms/"; xmlns:at="http://purl.org/atompub/tombstones/1.0";
	exclude-result-prefixes="xs dcterms at atom" version="2.0">
	
	<xsl:output method="xml" encoding="UTF-8"/>

	<xsl:param name="date"/>
	
	<xsl:param name="year" select="substring-before($date, '-')"/>
	<xsl:param name="month" select="substring-before(substring-after($date, '-'), '-')"/>
	<xsl:param name="day" select="substring-after(substring-after($date, '-'), '-')"/>

	<xsl:template match="/">
		<add>
			<xsl:apply-templates select="//node()[local-name() = 'entry']"/>
		</add>
	</xsl:template>

	<xsl:template match="*[local-name() = 'entry']">
		<xsl:variable name="local-date" select="substring-before(node()[local-name() = 'updated'], 'T')"/>
		<xsl:variable name="local-year" select="substring-before($local-date, '-')"/>
		<xsl:variable name="local-month"
			select="substring-before(substring-after($local-date, '-'), '-')"/>
		<xsl:variable name="local-day"
			select="substring-after(substring-after($local-date, '-'), '-')"/>
		<xsl:if test="$local-year &gt;= $year">
			<xsl:if test="$local-month &gt;= $month">
				<xsl:if test="$local-day &gt;= $day">
					<doc>
						<field name="id">
							<xsl:value-of select="substring-after(node()[local-name() = 'id'], 'authorities/')"/>
						</field>
						<field name="subject">
							<xsl:value-of select="node()[local-name() = 'title']"/>
						</field>
						<field name="created">
							<xsl:value-of select="dcterms:created"/>
						</field>
						<field name="modified">
							<xsl:value-of select="node()[local-name() = 'updated']"/>
						</field>
					</doc>
					<xsl:if test="position() = last()">
						<xsl:variable name="next" select="//node()[local-name() = 'link']...@rel='next']/@href"/>
						<xsl:apply-templates select="document($next)//node()[local-name() = 'entry']"/>
					</xsl:if>
				</xsl:if>
			</xsl:if>
		</xsl:if>
	</xsl:template>

</xsl:stylesheet>

Reply via email to