Thanks to helix and Claudia, I have now resolved the issue. Changing http 
to https in bitstream.baseUrl from oai.cfg did the trick!

Regards,
euler

On Thursday, April 6, 2017 at 4:17:21 PM UTC+8, Claudia Jürgen wrote:
>
> Hi euler, 
>
> there is a config setting 
>
> https://github.com/DSpace/DSpace/blob/dspace-5_x/dspace/config/modules/oai.cfg#L20
>  
> which determines the base URL for bitstreams. 
> So most likely you got http still there, if so change it and rebuild the 
> oai core. 
>
> Hope this helps 
>
> Claudia Jürgen 
>
>
> Am 06.04.2017 um 10:02 schrieb euler: 
> > Hi helix, 
> > 
> > I tried your suggestion to use text editor to open the corrupt pdf. Now 
> I 
> > am wondering why the harvested pdf contained this html response with an 
> > error message: 
> > 
> > <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> 
> > <html><head> 
> > <title>301 Moved Permanently</title> 
> > </head><body> 
> > <h1>Moved Permanently</h1> 
> > <p>The document has moved <a 
> > href="https://repository.seafdec.org.ph/bitstream/10862/1483/1/aep01.pdf";>here</a>.</p>
> >  
>
> > </body></html> 
> > 
> > Could this be because I setup Apache to redirect http to https? What 
> should 
> > I do to resolve this issue? So my hunch is correct that using https is 
> > causing this issue. 
> > 
> > Thanks and regards, 
> > euler 
> > 
> > 
> > On Thursday, April 6, 2017 at 3:46:55 PM UTC+8, euler wrote: 
> >> Hi helix, 
> >> 
> >> Thanks for the response. Yes, the pdfs are normal if downloaded 
> directly. 
> >> My issue is when I harvest that collection with full replication in the 
> >> harvesting options, the pdfs are corrupt. This is also happening in 
> other 
> >> collections. 
> >> 
> >> Thanks again. 
> >> Sincerely, 
> >> euler 
> >> 
> >> On Thursday, April 6, 2017 at 3:37:55 PM UTC+8, helix84 wrote: 
> >>> I tried to download one of the PDFs from your col_10862_1482, but it 
> >>> looks normal (~4 MB): 
> >>> 
> >>> http://repository.seafdec.org.ph/bitstream/10862/1483/1/aep01.pdf 
> >>> 
> >>> Look at the small PDF with a text editor. My guess is that you'll find 
> a 
> >>> HTML response there with an error message. 
> >>> 
> >>> 
> >>> Regards, 
> >>> ~~helix84 
> >>> 
> >>> Compulsory reading: DSpace Mailing List Etiquette 
> >>> https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette 
> >>> 
> >>> 
> >>> On Thu, Apr 6, 2017 at 9:28 AM, euler <[email protected]> wrote: 
> >>> 
> >>>> Dear All, 
> >>>> 
> >>>> I would like to know why the pdfs that were harvested from our 
> >>>> repository are corrupt, mostly the file size is 274~bytes. I am using 
> >>>> apache in front of tomcat and enabled https. I am not sure where to 
> look 
> >>>> why the pdfs harvested are corrupt. I did not find any entry from 
> dspace 
> >>>> log files that could be related to this issue. I tried harvesting our 
> >>>> repository in the dspace demo and in my local test instance but the 
> results 
> >>>> are the same, corrupt pdfs. Please help me locate what could be the 
> cause 
> >>>> of this. You can try harvesting a small collection (with only 3 
> items) from 
> >>>> our repository (set: col_10862_1482). The oai source is 
> >>>> https://repository.seafdec.org.ph/oai/request. I would also like to 
> ask 
> >>>> from anybody if they have a special setup in their oai if using https 
> >>>> because I have a hunch that this could be a reason also. 
> >>>> 
> >>>> Thanks in advance. 
> >>>> euler 
> >>>> 
> >>>> -- 
> >>>> You received this message because you are subscribed to the Google 
> >>>> Groups "DSpace Technical Support" group. 
> >>>> To unsubscribe from this group and stop receiving emails from it, 
> send 
> >>>> an email to [email protected]. 
> >>>> To post to this group, send email to [email protected]. 
> >>>> Visit this group at https://groups.google.com/group/dspace-tech. 
> >>>> For more options, visit https://groups.google.com/d/optout. 
> >>>> 
> >>> 
>
> -- 
> Claudia Juergen 
> Eldorado 
>
> Technische Universität Dortmund 
> Universitätsbibliothek 
> Vogelpothsweg 76 
> 44227 Dortmund 
>
> Tel.: +49 231-755 40 43 
> Fax: +49 231-755 40 32 
> [email protected] <javascript:> 
> www.ub.tu-dortmund.de 
>
> Wichtiger Hinweis: Die Information in dieser E-Mail ist vertraulich. Sie 
> ist ausschließlich für den Adressaten bestimmt. Sollten Sie nicht der für 
> diese E-Mail bestimmte Adressat sein, unterrichten Sie bitte den Absender 
> und vernichten Sie diese Mail. Vielen Dank. 
> Unbeschadet der Korrespondenz per E-Mail, sind unsere Erklärungen 
> ausschließlich final rechtsverbindlich, wenn sie in herkömmlicher 
> Schriftform (mit eigenhändiger Unterschrift) oder durch Übermittlung eines 
> solchen Schriftstücks per Telefax erfolgen. 
>
> Important note: The information included in this e-mail is confidential. 
> It is solely intended for the recipient. If you are not the intended 
> recipient of this e-mail please contact the sender and delete this message. 
> Thank you. Without prejudice of e-mail correspondence, our statements are 
> only legally binding when they are made in the conventional written form 
> (with personal signature) or when such documents are sent by fax. 
>

-- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

Reply via email to