On Thu, Jul 10, 2008 at 11:49 AM, Seth Woodworth <[EMAIL PROTECTED]> wrote: > If the material currently exists in a PDF that's fine. But if at all > possible an ebook should be in html, and a pdf should be converted > into html.
On Thu, 10 Jul 2008 14:17:50 -0700, "Edward Cherlin" <[EMAIL PROTECTED]> wrote: >Is that a documented decision or an opinion? I know of a number of ebooks in PDF format in Sugar distributions, and none in HTML. I'm curious about the rationale for HTML. Granted, it's re-flowable and it can be read on a web browser which also supports the broadest range of media types. But, there are better tools for reading ebooks than web browsers with features like searching (both individual document and collection of documents), bookmarking, document annotation, etc. (See FBReader or Adobe Digital Editions as examples.) The ANSI-recognized National Information Standards Organization sets standards in this area and their "A Framework of Guidance for Building Good Digital Collections" (available at http://www.niso.org/publications/rp/ ) provides guidance for digital libraries. This framework is widely observed (see the Digital Library Federation, http://www.diglib.org/standards/imlsframe.htm). The Recommendation makes distinctions between "born digital" materials and non-digital source materials such as printed matter, and it includes recommended formats for a broad range of media types. For born digital textual materials the recommendations are basically either PDF/A or XML based standards like the Open epub format (see http://www.openebook.org/). For non-digital source printed materials they recommend TIFF or JPEG2000 formats. See the discussion beginning on page 26 for recommended formats. One advantage of following the NISO standards is the interoperability provided with other libraries. For example, The Digital Library of India - http://dli.iiit.ac.in/ -- is embarked on building a million book collection. OLPC would thus have access to a large store of materials without the up-front trouble of digitizing them. The Digital Library of India uses the TIFF format with an OCRd text version to provide searchability. A TIFF rendering browser plugin is used for display. I'm not sure why the Indian project gave such priority to TIFF, but I suspect that most of the matter that is either out of Copyright or out of print matter that could be freely licensed was available only on printed pages that had to be scanned. Conversely, I suspect that most of the born digital material is so recent that it is protected by Copyright. Another set of considerations that relates to file formats might be the choice of library software, e.g., Greenstone {open source project funded by United Nations}, Dspace {MIT/Hewlett Packard}, to support functions such as a searchable catalog, subject classification, rights management, etc. Peter Hollings _______________________________________________ Library mailing list [email protected] http://lists.laptop.org/listinfo/library
