It seems that there is something fishy going on with macOS's UTF-8 handling. I am not entirely sure what are the details of the underlying incompatibility, but I got the transfer working when I copied the files from my Mac to RH Linux server with
$ rsync --iconv=UTF-8-MAC,UTF-8 Ilja > On 12 Dec 2016, at 09:37, Sidoroff, Ilja <[email protected]> wrote: > > Hmm... something strange going on. I replaced spaces with underscores in the > filenames, and now I get the following: > > when importing: > > java.io.FileNotFoundException: > ark/item_003/Helsingin_yliopisto_on_tietoaineistojen_hallinnan_edelläkävijä.pdf > (No such file or directory) > > copy-pasting the path to terminal: > > $ ls > ark/item_003/Helsingin_yliopisto_on_tietoaineistojen_hallinnan_edelläkävijä.pdf > ls: cannot access > ark/item_003/Helsingin_yliopisto_on_tietoaineistojen_hallinnan_edelläkävijä.pdf: > No such file or directory > > but listing the directory contents: > > $ ls ark/item_003/ > contents dublin_core.xml > Helsingin_yliopisto_on_tietoaineistojen_hallinnan_edelläkävijä.pdf > > or here: > > $ ls -l ark/item_003/ > total 16 > -rw-r--r-- 1 sidoroff sidoroff 70 Dec 12 09:16 contents > -rw-r--r-- 1 sidoroff sidoroff 807 Dec 12 09:16 dublin_core.xml > -rw-r--r-- 1 sidoroff sidoroff 6109 Dec 12 09:16 > Helsingin_yliopisto_on_tietoaineistojen_hallinnan_edelläkävijä.pdf > > So the problem seems to be somewhere in the Unicode/UTF-8 handling of RHEL 7 > or my Mac, where I prepared the import package. > > Ilja > >> On 12 Dec 2016, at 08:49, Sidoroff, Ilja <[email protected]> wrote: >> >> Hi Tom, >> >> my locale is >> >> LANG=en_US.UTF-8 >> LC_CTYPE="en_US.UTF-8" >> LC_NUMERIC="en_US.UTF-8" >> LC_TIME="en_US.UTF-8" >> LC_COLLATE="en_US.UTF-8" >> LC_MONETARY="en_US.UTF-8" >> LC_MESSAGES="en_US.UTF-8" >> LC_PAPER="en_US.UTF-8" >> LC_NAME="en_US.UTF-8" >> LC_ADDRESS="en_US.UTF-8" >> LC_TELEPHONE="en_US.UTF-8" >> LC_MEASUREMENT="en_US.UTF-8" >> LC_IDENTIFICATION="en_US.UTF-8" >> LC_ALL= >> >> and I get the same errors with LC_ALL="" or "en_US.UTF-8". I think I'll try >> next to see if this is a something happening in OS or Java-level. >> >> >> Ilja >>> On 07 Dec 2016, at 14:57, Tom Desair <[email protected]> wrote: >>> >>> Hi Ilja, >>> >>> One of our clients had a similar problem. Can you give me the output of the >>> "locale" command on your DSpace server? >>> >>> Can you also try setting the "LC_ALL" environment variable to an empty >>> string or "en_US.UTF-8" before running the import: >>> $ export LC_ALL="" >>> $ bin/dspace import ... >>> or >>> $ export LC_ALL="en_US.UTF-8" >>> $ bin/dspace import ... >>> >>> More information on this can be found here: >>> http://bugs.java.com/bugdatabase/view_bug.do?bug_id=4733494 >>> >>> Best regards, >>> Tom >>> >>> >>> >>> Tom Desair >>> 250-B Suite 3A, Lucius Gordon Drive, West Henrietta, NY 14586 >>> Esperantolaan 4, Heverlee 3001, Belgium >>> www.atmire.com >>> >>> 2016-12-07 13:43 GMT+01:00 Sidoroff, Ilja <[email protected]>: >>> Hello, >>> >>> I noticed some weird behaviour when trying to import items into DSpace >>> using command line and simple archive format. I noticed that if I have >>> bitstreams, whose names contain both SPACEs and scandinavian special >>> characters, import fails, when the OS cannot find the bitstream in question. >>> >>> For instance, a bitstream name with space is ok: >>> >>> Adding item from directory item_002 >>> Loading dublin core from ark/item_002/dublin_core.xml >>> ... >>> Processing contents file: ark/item_002/contents >>> Bitstream: Digigraduille uusi prosessi.pdf >>> >>> Bitstream name with 'ä' (a+uml) is ok: >>> >>> Adding item from directory item_005 >>> Loading dublin core from ark/item_005/dublin_core.xml >>> ... >>> Bitstream: Käisä1.pdf >>> >>> But this is not ok: >>> >>> Adding item from directory item_006 >>> Loading dublin core from ark/item_006/dublin_core.xml >>> ... >>> java.io.FileNotFoundException: ark/item_006/Kirjastoelämää Bolognassa.pdf >>> (No such file or directory) >>> ... >>> java.io.FileNotFoundException: ark/item_006/Kirjastoelämää Bolognassa.pdf >>> (No such file or directory) >>> >>> >>> stracing the import gives the underlying error: >>> >>> 21515 open("ark/item_006/Kirjastoel\303\244m\303\244\303\244 >>> Bolognassa.pdf", O_RDONLY) = -1 ENOENT (No such file or directory) >>> >>> I'm using RHEL 7.2, with LANG=en_US.UTF-8. I'm not sure whether is some >>> operating system (or even filesystem? XFS) specific behaviour, or if the >>> java is the culprit, or if this could be helped with some Java IO magic >>> (and thus worth opening a ticket). I tested this with DSpace 6.0, but I >>> think this would happen with other versions as well. >>> >>> >>> Ilja Sidoroff >>> Information Systems Specialist >>> Helsinki University Library >>> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "DSpace Technical Support" group. >>> To unsubscribe from this group and stop receiving emails from it, send an >>> email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/dspace-tech. >>> For more options, visit https://groups.google.com/d/optout. >>> >>> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "DSpace Technical Support" group. >>> To unsubscribe from this group and stop receiving emails from it, send an >>> email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/dspace-tech. >>> For more options, visit https://groups.google.com/d/optout. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "DSpace Technical Support" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at https://groups.google.com/group/dspace-tech. >> For more options, visit https://groups.google.com/d/optout. > > -- > You received this message because you are subscribed to the Google Groups > "DSpace Technical Support" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/dspace-tech. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/dspace-tech. For more options, visit https://groups.google.com/d/optout.
