Hi helix, I tried your suggestion to use text editor to open the corrupt pdf. Now I am wondering why the harvested pdf contained this html response with an error message:
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>301 Moved Permanently</title> </head><body> <h1>Moved Permanently</h1> <p>The document has moved <a href="https://repository.seafdec.org.ph/bitstream/10862/1483/1/aep01.pdf">here</a>.</p> </body></html> Could this be because I setup Apache to redirect http to https? What should I do to resolve this issue? So my hunch is correct that using https is causing this issue. Thanks and regards, euler On Thursday, April 6, 2017 at 3:46:55 PM UTC+8, euler wrote: > > Hi helix, > > Thanks for the response. Yes, the pdfs are normal if downloaded directly. > My issue is when I harvest that collection with full replication in the > harvesting options, the pdfs are corrupt. This is also happening in other > collections. > > Thanks again. > Sincerely, > euler > > On Thursday, April 6, 2017 at 3:37:55 PM UTC+8, helix84 wrote: >> >> I tried to download one of the PDFs from your col_10862_1482, but it >> looks normal (~4 MB): >> >> http://repository.seafdec.org.ph/bitstream/10862/1483/1/aep01.pdf >> >> Look at the small PDF with a text editor. My guess is that you'll find a >> HTML response there with an error message. >> >> >> Regards, >> ~~helix84 >> >> Compulsory reading: DSpace Mailing List Etiquette >> https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette >> >> >> On Thu, Apr 6, 2017 at 9:28 AM, euler <[email protected]> wrote: >> >>> Dear All, >>> >>> I would like to know why the pdfs that were harvested from our >>> repository are corrupt, mostly the file size is 274~bytes. I am using >>> apache in front of tomcat and enabled https. I am not sure where to look >>> why the pdfs harvested are corrupt. I did not find any entry from dspace >>> log files that could be related to this issue. I tried harvesting our >>> repository in the dspace demo and in my local test instance but the results >>> are the same, corrupt pdfs. Please help me locate what could be the cause >>> of this. You can try harvesting a small collection (with only 3 items) from >>> our repository (set: col_10862_1482). The oai source is >>> https://repository.seafdec.org.ph/oai/request. I would also like to ask >>> from anybody if they have a special setup in their oai if using https >>> because I have a hunch that this could be a reason also. >>> >>> Thanks in advance. >>> euler >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "DSpace Technical Support" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/dspace-tech. >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/dspace-tech. For more options, visit https://groups.google.com/d/optout.
