On Wed, Nov 28, 2012 at 5:14 AM, LIBRIS Reference (LIBRIS) <[email protected]> wrote: > 1. We need help in harvesting data from bitstreams such as htm/html/pdf > files. > > 2. Our web-scale discovery service needs to know whether a DSpace record has > full text. A record has full text when there is a pdf bitstream > (attachment). Can harvesting capture such data?
Yes, the METS document contains the information. You can access it in two ways. If the target repository is using XMLUI, it's available at this location (example): http://demo.dspace.org/xmlui/metadata/handle/10673/127/mets.xml If you also need the access rights details (some bitstreams may be inaccessible): http://demo.dspace.org/xmlui/metadata/handle/10673/127/mets.xml?rightsMDTypes=METSRIGHTS Alternatively, if it's not using XMLUI, you can access it via OAI-PMH: http://demo.dspace.org/oai/request?verb=GetRecord&metadataPrefix=mets&identifier=oai:demo.dspace.org:10673/127 (If you're using a web browser to view it, you have to use the "View source" function to see the XML because this is a newer version) Regards, ~~helix84 Compulsory reading: DSpace Mailing List Etiquette https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette ------------------------------------------------------------------------------ Keep yourself connected to Go Parallel: INSIGHTS What's next for parallel hardware, programming and related areas? Interviews and blogs by thought leaders keep you ahead of the curve. http://goparallel.sourceforge.net _______________________________________________ DSpace-tech mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

