Re: [Dspace-tech] Odd Characters in Search Results...
Andrea, thank you so much. We add this to the top of our cron job: LANG=en_US.UTF-8 We remove the corrupt text bundle and re-run the media filter: /dspace/bin/dspace filter-media -i 10177/4732 and the files look perfect. Bill K. -- View this message in context: http://dspace.2283337.n4.nabble.com/Odd-Characters-in-Search-Results-tp4678061p4678125.html Sent from the DSpace - Tech mailing list archive at Nabble.com. -- ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
Re: [Dspace-tech] Odd Characters in Search Results...
Hi Andrea, I guess I figured it out how to apply this in a windows environment. I just added the line LANG=en_US.UTF-8 at the end of the command dspace filter-media. I did a search on our repository first and looked for items that returned odd characters in its search results. Then I force dspace to reindex that particular item and the odd characters went away. Hope this helps the original poster of this thread. ;-) Thank you very much, euler -- View this message in context: http://dspace.2283337.n4.nabble.com/Odd-Characters-in-Search-Results-tp4678061p4678075.html Sent from the DSpace - Tech mailing list archive at Nabble.com. -- ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
Re: [Dspace-tech] Odd Characters in Search Results...
Hi, On 28/05/15 17:21, euler wrote: Thanks for the link. I forgot to mention that I am using Windows 2003 as my OS, so I'm not using crontab, instead I have a batch file that is executed by Scheduled Tasks. Apologies for my ignorance, but I don't know how to apply this to a Windows environment. I have no idea either, but perhaps (hopefully) someone else on this list can help! cheers, Andrea -- Dr Andrea Schweer IRR Technical Specialist, ITS Information Systems The University of Waikato, Hamilton, New Zealand -- ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
Re: [Dspace-tech] Odd Characters in Search Results...
Hi, On 28/05/15 15:24, euler wrote: Thanks for this. I just assumed that the original characters in my pdfs were defective somehow (some are defective actually, not OCRed but digital born documents). I would be glad to know how to make sure that the dspace media-filter will use the correct locale and UTF-8 encoding? I may have missed something in the documentation on how to set this. Well -- how are you running the media filter? If you're running it from a crontab on linux, try the line I put into my other reply. Or eg http://www.logikdev.com/2010/02/02/locale-settings-for-your-cron-job/ cheers, Andrea -- Dr Andrea Schweer IRR Technical Specialist, ITS Information Systems The University of Waikato, Hamilton, New Zealand -- ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
Re: [Dspace-tech] Odd Characters in Search Results...
Hi Andrea, Thanks for this. I just assumed that the original characters in my pdfs were defective somehow (some are defective actually, not OCRed but digital born documents). I would be glad to know how to make sure that the dspace media-filter will use the correct locale and UTF-8 encoding? I may have missed something in the documentation on how to set this. Thanks in advance, euler -- View this message in context: http://dspace.2283337.n4.nabble.com/Odd-Characters-in-Search-Results-tp4678061p4678068.html Sent from the DSpace - Tech mailing list archive at Nabble.com. -- ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
Re: [Dspace-tech] Odd Characters in Search Results...
Hi Bill, I'm having this issues also. I resolved this by adding TEXT in dspace.cfg, ie xmlui.bundle.upload = ORIGINAL, TEXT, METADATA, THUMBNAIL, LICENSE, CC-LICENSE so that I can upload TEXT bundle aside from the ORIGINAL which is pdf. I just made sure that the text file was saved in UTF-8 encoding. The question marks that you're seeing are the extracted text made by dspace media-filter. By uploading a text version and deleting the extracted text, the question marks in search results went away. Hope this help. Regards, euler -- View this message in context: http://dspace.2283337.n4.nabble.com/Odd-Characters-in-Search-Results-tp4678061p4678066.html Sent from the DSpace - Tech mailing list archive at Nabble.com. -- ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
Re: [Dspace-tech] Odd Characters in Search Results...
Hi, On 28/05/15 14:44, euler wrote: I'm having this issues also. I resolved this by adding TEXT in dspace.cfg, ie xmlui.bundle.upload = ORIGINAL, TEXT, METADATA, THUMBNAIL, LICENSE, CC-LICENSE so that I can upload TEXT bundle aside from the ORIGINAL which is pdf. I just made sure that the text file was saved in UTF-8 encoding. The question marks that you're seeing are the extracted text made by dspace media-filter. By uploading a text version and deleting the extracted text, the question marks in search results went away. If that solved the problem for you then my suspicion is that your media filter runs with the wrong locale. You need to make sure that the media filter is using UTF-8. I have LANG=en_NZ.UTF-8 at the top of tomcat's crontab for that reason (you presumably want something other than en_NZ). cheers, Andrea -- Dr Andrea Schweer IRR Technical Specialist, ITS Information Systems The University of Waikato, Hamilton, New Zealand -- ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette