Hi, everyone My situation is as follows
1) I am trying to reconfigure Dspace to use xpdf media filter on my 4.1 test installation. Install went smooth as far as I understand and I am able to run filter-media without any error messages displayed. 2) Extraction works fine for English-language files. However extraction from Russian-language (cyrillic) pdfs returns txt with a mess of unrecognizable characters. 3) Strange thing is that I did set "textEncoding UTF-8" option in xpdfrc config file for xpdf. So presumably txt files generated by xpdf should be ok encoding-wise. To test it I run xpdf from command prompt on one of my cyrillic pdfs. Output txt file was readable and utf-8-encoded as expected. Later I uploaded this txt file to Dspace as ordinary bitstream for one of my test items and opened it from Dspace with view/open. Browser displayed unrecognizable characters with encoding autodetected as cyrillic-iso-8859-5. Changing it manually to utf-8 returns expected text. Any ideas on how to fix it? Pavel Chunzhin -- View this message in context: http://dspace.2283337.n4.nabble.com/Dspace-xpdf-filter-Cyrillic-text-extraction-tp4672126.html Sent from the DSpace - Tech mailing list archive at Nabble.com. ------------------------------------------------------------------------------ Learn Graph Databases - Download FREE O'Reilly Book "Graph Databases" is the definitive new guide to graph databases and their applications. Written by three acclaimed leaders in the field, this first edition is now available. Download your free book today! http://p.sf.net/sfu/13534_NeoTech _______________________________________________ DSpace-tech mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

