On Oct 29, 2012, at 8:19 AM, Henri-Damien LAURENT <[email protected]> 
wrote:

> I am about to write a tool which would help indexing EPUB into ILSes.
> My first guess is to produce ISO2709 or MARCXML record from EPUB files,
> but since MARCXML or ISO2709 is not really what I would call the more 
> portable (UNIMARC and MARC21 may both be handled in the same file 
> format), I am rather considering producing OAI-DC or html5 +schema.org 
> <http://schema.org/>+dublin corebut that would rely on EPUB3.
> 
> Any comment anyone ?
> Has anyone considered such a tool ?
> Is there any hidden corpse lurking around I should be aware of ?


A couple of years ago I wrote a TEI to ePub creation tool. I learned a lot from 
that process. Most importantly, I learned the inner HTML must validate. (Whew!) 
If you wanted to include full-text indexing of epub files into your catalog -- 
which I would personally endorse -- then you could to the following things:

  * use the metadata coming from the ancillary epub files to create MARC records
  * use the full text data come from the HTML to support full text indexing

My creation tool is located on Github -- 
https://github.com/ericleasemorgan/epub/  The most important file is buried in 
the bin directory and called alex2epub.pl or build.pl. I know, theses files 
create epub files and not really index them, but in order to index them a 
person needs to know how they are built.

Good luck. I really like the idea of full text induing in library "catalogs".

--
Eric Lease Morgan
University of Notre Dame

574/631-8604

Reply via email to