On Fri, Jun 16, 2017 at 10:15 AM, Martin Urbanec <martin.urba...@wikimedia.cz> wrote: > Hello, > > I have a script which should add a template to articles which are created by > the ContentTranslation tool (the template has parameters which depends on > language and revision which were used as the source one; this is the reason > why I use separate script). It may be found at > https://github.com/urbanecm/addPrekladCT/blob/master/addmissing.py. The > script work perfectly on my local PC and on bastion host but I can't get it > work on the grid. > > The script itself is run by python3 addmissing.py -always -file:pages.txt > -search:'-insource:/\{\{[Pp]řeklad/' and require pages.txt file and > preklads.txt file at > https://tools.wmflabs.org/urbanecmbot/test/preklads.txt. The first contains > pages that should be processed and act as the generator, the second one is > something like a database with exact templates which should be inserted. > Both files are as an example in the attachments. > > When I try to run it at toollabs bastion, all works as it should. When I > send the script to grid, it do not work (see sample output below). Why? Can > somebody help me with it? > > Thank you in advance, > Martin Urbanec / Urbanecm > > ; Output > > urbanecm@tools-bastion-02 ~/Documents/cswiki/addPrekladCT > $ cat test.sh > python3 addmissing.py -always -file:pages.txt > -search:'-insource:/\{\{[Pp]řeklad/' > urbanecm@tools-bastion-02 ~/Documents/cswiki/addPrekladCT > $ jsub bash test.sh > Your job 6201363 ("bash") has been submitted > urbanecm@tools-bastion-02 ~/Documents/cswiki/addPrekladCT > $ qstat > job-ID prior name user state submit/start at queue > slots ja-task-ID > ----------------------------------------------------------------------------------------------------------------- > 6201363 0.30000 bash urbanecm r 06/16/2017 18:14:42 > t...@tools-exec-1404.eqiad.wmf 1 > urbanecm@tools-bastion-02 ~/Documents/cswiki/addPrekladCT > $ ls ~/bash.* > /home/urbanecm/bash.err /home/urbanecm/bash.out > urbanecm@tools-bastion-02 ~/Documents/cswiki/addPrekladCT > $ cat ~/bash.* > Traceback (most recent call last): > File "addmissing.py", line 223, in <module> > main() > File "addmissing.py", line 183, in main > local_args = pywikibot.handle_args(args) > File "/shared/pywikipedia/core/pywikibot/bot.py", line 954, in handle_args > writeToCommandLogFile() > File "/shared/pywikipedia/core/pywikibot/bot.py", line 1128, in > writeToCommandLogFile > command_log_file.write(s + os.linesep) > File "/usr/lib/python3.4/codecs.py", line 711, in write > return self.writer.write(data) > File "/usr/lib/python3.4/codecs.py", line 368, in write > data, consumed = self.encode(object, self.errors) > UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc5' in > position 67: surrogates not allowed > CRITICAL: Closing network session. > <class 'UnicodeEncodeError'> > urbanecm@tools-bastion-02 ~/Documents/cswiki/addPrekladCT > $
Zhuyifei1999 saw your email and noted on irc that it looks to be a case of the known bug that I just retitled as "Shell LOCALE neither consistent nor sane across grid engine nodes" (<https://phabricator.wikimedia.org/T60784>). The current best work around that bug is to launch the job as a shell script that sets either LANG=C.UTF-8 or PYTHONIOENCODING=utf-8. If setting the job to run with the same locale you are using in your interactive tests does not work to fix the problem, you may also be hitting a deeper Python3 unicode issue related to surrogate codepoints (<https://bugs.python.org/issue12892>). This is hinted by the "position 67: surrogates not allowed" error message. I can actually reproduce your error message in an interactive python session on tools-dev from a starting state of LANG=en_US.UTF-8: $ python3 Python 3.4.0 (default, Jun 19 2015, 14:20:21) [GCC 4.8.2] on linux Type "help", "copyright", "credits" or "license" for more information. >>> print('\udcc5') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc5' in position 0: surrogates not allowed >>> Explictly encoding using 'surrogateescape' does work: >>> print('\udcc5'.encode('utf-8', 'surrogateescape')) b'\xc5' It looks like the error could be dealt with in pywikibot by patching writeToCommandLogFile() to open the codec used for output with any value other than the default errors='strict' (<https://docs.python.org/3/library/codecs.html#error-handlers>). $ python3 Python 3.4.0 (default, Jun 19 2015, 14:20:21) [GCC 4.8.2] on linux Type "help", "copyright", "credits" or "license" for more information. >>> print('\udcc5'.encode('utf-8', 'ignore')) b'' >>> print('\udcc5'.encode('utf-8', 'replace')) b'?' >>> print('\udcc5'.encode('utf-8', 'xmlcharrefreplace')) b'�' >>> print('\udcc5'.encode('utf-8', 'backslashreplace')) b'\\udcc5' >>> print('\udcc5'.encode('utf-8', 'surrogateescape')) b'\xc5' >>> print('\udcc5'.encode('utf-8', 'surrogatepass')) b'\xed\xb3\x85' >>> Bryan -- Bryan Davis Wikimedia Foundation <bd...@wikimedia.org> [[m:User:BDavis_(WMF)]] Manager, Cloud Services Boise, ID USA irc: bd808 v:415.839.6885 x6855 _______________________________________________ Labs-l mailing list Labs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/labs-l