Hey, Aleksander.

Actually, I just fixed this... somehow on accident.

Today we did a batch metadata cleanup via SQL and modified the Discovery
sidebar facet configs, so I had to rebuild the indexes with
`index-discovery -b`. Also, we wanted to re-generate all of our PDF
thumbnails for DSpace 4's higher-quality versions (we upgraded a few months
ago but hadn't yet re-generated thumbnails for existing items).

I'm not sure why this didn't work before when I was doing my research in
December[0], but I'm glad that it's fixed!

Thanks for following up with me! I hope this helps someone else...

Alan

[0] https://github.com/ilri/DSpace/issues/43

On Fri Feb 13 2015 at 3:56:20 PM Aleksandar Stojanov <a.stoja...@losisin.com>
wrote:

> Hi,
>
> I've visited the repository link (
> https://cgspace.cgiar.org/handle/10568/51393) on the GitHub discussion
> page and made some searching there. I've noticed that it happens on a lot
> of pdf's there and always at the same place which is after page number. It
> then inserts form feed character which is Unicode \u000c character for new
> page or new line. Although, this is valid HTML, it's invalid XHTML and
> recommended practice would be to threat it as zero-width character because
> it has no semantic meaning.
> http://www.w3.org/TR/unicode-xml/#White
> We had similar problem with search results and weird characters and this
> helped:
> http://sourceforge.net/p/dspace/mailman/message/31212700/
>
> Can you try that solution and post back the results? Also, don't forget to
> make a back up first.
>
> Cheers,
> Aleksandar Stojanov
>
> On Thu, Feb 12, 2015 at 10:53 AM, Alan Orth <alan.o...@gmail.com> wrote:
>
>> Hey, bender. No, we didn't figure this out. In fact, it's still an open
>> issue on our institution's GitHub issue tracker!
>>
>> https://github.com/ilri/DSpace/issues/43
>>
>> I've posted a few notes there but haven't come to any conclusion. :(
>>
>> Alan
>>
>> On Fri Jan 02 2015 at 8:54:00 PM bender <bender.bending.1...@gmail.com>
>> wrote:
>>
>>> Hi Alan:
>>>
>>> Did you solved this issue?
>>> And how? If you did.
>>>
>>> Bender
>>>
>>> 2014-12-09 13:09 GMT-03:00 Alan Orth <alan.o...@gmail.com>:
>>>
>>> Antoine,
>>>>
>>>> In this case the dspace script respects the environment's JAVA_OPTS if
>>>> it is set; the one in the script is only used if JAVA_OPTS is empty.
>>>>
>>>> Alan
>>>>
>>>> On Tue, Dec 9, 2014 at 6:54 PM, Antoine Snyers <anto...@atmire.com>
>>>> wrote:
>>>>
>>>>>  Hi Alan Orth
>>>>>
>>>>> -Dfile.encoding=UTF-8 should be added to the "bin/dspace" command.
>>>>> Here is the line:
>>>>> https://github.com/DSpace/DSpace/blob/dspace-4.2/dspace/bin/dspace#L75
>>>>>
>>>>> Then rerun 'index-discovery -b'.
>>>>> I believe this will resolve your problem.
>>>>>
>>>>> Antoine Snyers
>>>>>
>>>>> Alan Orth schreef op 09/12/14 14:49:
>>>>>
>>>>>   Hi,
>>>>>
>>>>>  Our DSpace 4.2's Discovery search results displays snippets from the
>>>>> item's full-text PDF extract, but we get mojibake (strange characters) in
>>>>> the summaries (see attached photo).  Browsing to the item's PDF-extracted
>>>>> text bitstream indeed shows the strange characters, and Firefox's 
>>>>> developer
>>>>> tools show the encoding is ISO-8859-1.  What's strange is, if I download
>>>>> the file the resulting encoding is UTF-8, and these characters display
>>>>> properly.
>>>>>
>>>>>  I have tried the following:
>>>>>  - Confirmed our Tomcat connectors are using URIEncoding="UTF-8"
>>>>>  - Forced "-Dfile.encoding=UTF-8" in JAVA_OPTS and manually re-run
>>>>> `filter-media' as well as `index-discovery -b'
>>>>>
>>>>>   What could I be missing?
>>>>>
>>>>> Thanks!
>>>>>
>>>>>  --
>>>>>  Alan Orth
>>>>> alan.o...@gmail.com
>>>>> https://alaninkenya.org
>>>>> https://mjanja.ch
>>>>> "In heaven all the interesting people are missing." -Friedrich
>>>>> Nietzsche
>>>>> GPG public key ID: 0x8cb0d0acb5cd81ec209c6cdfbd1a0e09c2f836c0
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
>>>>> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
>>>>> with Interactivity, Sharing, Native Excel Exports, App Integration & more
>>>>> Get technology previously reserved for billion-dollar corporations, 
>>>>> FREEhttp://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> DSpace-tech mailing 
>>>>> listDSpace-tech@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/dspace-tech
>>>>> List Etiquette: 
>>>>> https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>   [image: logo]
>>>>>  *Antoine Snyers*
>>>>> *2888 Loker Avenue East, Suite 315, Carlsbad, CA. 92010*
>>>>> *Esperantolaan 4, Heverlee 3001, Belgium*
>>>>> www.atmire.com
>>>>> <http://atmire.com/website/?q=services&utm_source=emailfooter&utm_medium=email&utm_campaign=antoine>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Alan Orth
>>>> alan.o...@gmail.com
>>>> https://alaninkenya.org
>>>> https://mjanja.ch
>>>> "In heaven all the interesting people are missing." -Friedrich Nietzsche
>>>> GPG public key ID: 0x8cb0d0acb5cd81ec209c6cdfbd1a0e09c2f836c0
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
>>>> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
>>>> with Interactivity, Sharing, Native Excel Exports, App Integration &
>>>> more
>>>> Get technology previously reserved for billion-dollar corporations, FREE
>>>>
>>>> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
>>>> _______________________________________________
>>>> DSpace-tech mailing list
>>>> DSpace-tech@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/dspace-tech
>>>> List Etiquette:
>>>> https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
>>>>
>>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is
>> your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now. http://goparallel.sourceforge.net/
>
>
>> _______________________________________________
>> DSpace-tech mailing list
>> DSpace-tech@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/dspace-tech
>> List Etiquette:
>> https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
>>
>
------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Reply via email to