Okay all, Christian followed up off list and mentioned that his
columns had not been UTF-8 encoded.

So I think the best thing to do is use Postgres, but  if you have to
use MySQL make sure columns are UTF-8 encoded with the newest Galaxy
and add ?charset=utf8 to your database connection string. I am not
sure if "unicodifing" job output and errors was what broke this or the
upgrade of sqlalchemy - but I imagine before data was essentially
being lost if everything was being stored in the database as latin-1.
So I guess it was broken before just not throwing exceptions :).

If changing the database is not something one wants to do, I committed
a patch to galaxy-central so after the next dist release one will be
able to set GALAXY_DEFAULT_ENCODING environment variable to 'ascii',
'latin-1', etc... to "clean" data going into the database. Again
though, I would not recommend setting this - modifying the database
encoding is a superior approach since it doesn't result in data loss.

https://bitbucket.org/galaxy/galaxy-central/commits/bba4f8883afb62b142dd9ffa229db387f7e9f857

-John

On Fri, Dec 6, 2013 at 8:21 AM, David Hoover <hoove...@helix.nih.gov> wrote:
> Adding "?charset=utf8" to the connection string worked!
>
> However, there is one really weird side effect that probably has nothing to 
> do with utf8.  The tool creates two pdf files and one zip file as output.  
> For the two pdf files, the expansion of the dataset in the history bar shows 
> 'Image in pdf format'.  The zip file shows gobbledygook.  How do I tell 
> Galaxy to recognize zip format and not try to parse/head it?
>
> Thanks John!
>
> On Dec 5, 2013, at 5:03 PM, John Chilton wrote:
>
>> Hmmm... can you try adding "?charset=utf8" to your database connection
>> string - that may fix the problem?
>>
>> If not - is there a way to tell if the actual columns have changed.
>> Some comments on stackoverflow make it sound like the commands you
>> listed will only affect new columns.
>>
>> Can you try the CONVERT TO CHARACTER SET.
>>
>> ALTER TABLE tbl_name CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
>>
>> I don't think the problem is sqlalchemy right - this works for
>> postgres and sqlite I believe - it is either that MySQL cannot store
>> UTF-8 data in that column or there is a problem in the mysql
>> connector. It is not clear to me where the problem is based on your
>> stack trace and explanation. I would be happy to work around a
>> limitation in the mysql connector by adding a config option to Galaxy
>> if I were certain that there was a bug in the mysql connector.
>>
>> -John
>>
>> On Thu, Dec 5, 2013 at 12:55 PM, David Hoover <hoove...@helix.nih.gov> wrote:
>>> John,
>>>
>>> I stopped galaxy, then ran ALTER DATABASE galaxydb DEFAULT CHARACTER SET = 
>>> 'utf8', then ran ALTER TABLE `[table]` DEFAULT CHARACTER SET = 'utf8' on 
>>> all the tables in galaxydb.  After starting up galaxy and rerunning the 
>>> jobs (using the unaltered version of lib/galaxy/util/__init__.py), the job 
>>> failed with the same error.
>>>
>>> Can I configure the sqlalchemy connection to use utf8?  Or must I 
>>> reconfigure the entire server to use utf8?
>>>
>>> --David
>>>
>>> On Dec 5, 2013, at 1:32 PM, John Chilton wrote:
>>>
>>>> Fantastic!
>>>>
>>>> For this particular problem - I guess you don't strictly need to
>>>> modify more than just job and maybe task tables. I suspect at some
>>>> point there will be a non-latin-1 job parameter or history name or
>>>> username, etc... that will result in a similar problem though - so if
>>>> you could just make it all UTF-8 that would probably be ideal.
>>>>
>>>> -John
>>>>
>>>>
>>>> On Thu, Dec 5, 2013 at 12:01 PM, David Hoover <hoove...@helix.nih.gov> 
>>>> wrote:
>>>>> Right, nevermind, 'hg log' listed that changeset 10953:e786022dc67e.
>>>>>
>>>>> Changing DEFAULT_ENCODING to 'latin-1' in lib/galaxy/util/__init__.py 
>>>>> worked.
>>>>>
>>>>> Do I need to alter ALL the MySQL tables to UTF-8, or just a selection of 
>>>>> tables?  Will future updates explicitly create new tables with 
>>>>> CHARSET=utf-8, or do I need to reconfigure MySQL to have a new default?
>>>>>
>>>>> -- David
>>>>>
>>>>> On Dec 5, 2013, at 12:32 PM, John Chilton wrote:
>>>>>
>>>>>> Actually, can you verify that this commit
>>>>>> https://bitbucket.org/galaxy/galaxy-central/commits/e786022dc67ed918050bd81b9ac679ac958e4f75
>>>>>> is in your distribution and if it is try changing:
>>>>>>
>>>>>> DEFAULT_ENCODING = 'utf-8'
>>>>>>
>>>>>> in lib/galaxy/util.py to
>>>>>>
>>>>>> DEFAULT_ENCODING = 'latin-1'
>>>>>>
>>>>>> If that works then - I can create a database_encoding_default option
>>>>>> in universe_wsgi.ini and let you switch it to latin-1 instead of
>>>>>> needing to patch Galaxy. Otherwise, setting the MySQL tables to be
>>>>>> UTF-8 is probably the better approach - though again - backup and test
>>>>>> before applying that change in production.
>>>>>>
>>>>>> Hope this helps,
>>>>>> -John
>>>>>>
>>>>>>
>>>>>> On Thu, Dec 5, 2013 at 11:17 AM, John Chilton <chil...@msi.umn.edu> 
>>>>>> wrote:
>>>>>>> David, Christian,
>>>>>>>
>>>>>>> Very sorry about this - this is probably related to fixing some other
>>>>>>> errors - 
>>>>>>> http://dev.list.galaxyproject.org/Unicode-in-tool-stderr-crashing-galaxy-tt4661749.html#a4661750.
>>>>>>> I will try to look into this.
>>>>>>>
>>>>>>> Christian - what database are targeting? Is it MySQL as well?
>>>>>>>
>>>>>>> David - do you have a test setup you can hack on? I wonder if this
>>>>>>> would go away if you converted your tables to UTF-8.
>>>>>>>
>>>>>>> http://stackoverflow.com/questions/6115612/how-to-convert-an-entire-mysql-database-characterset-and-collation-to-utf-8
>>>>>>>
>>>>>>> That is not my official recommendation though - I need to do some more
>>>>>>> research first.
>>>>>>>
>>>>>>> -John
>>>>>>>
>>>>>>> On Thu, Dec 5, 2013 at 11:04 AM, Christian Hundsrucker
>>>>>>> <christian.hundsruc...@fmi.ch> wrote:
>>>>>>>> Hi David, hi all!
>>>>>>>>
>>>>>>>> I have a similar/the same issue in another setting...
>>>>>>>>
>>>>>>>> galaxy/galaxy_dist/lib/galaxy/jobs/runners/local.py", line 116, in 
>>>>>>>> queue_job
>>>>>>>>  job_wrapper.finish( stdout, stderr, exit_code )
>>>>>>>> [...]
>>>>>>>>
>>>>>>>> galaxy/galaxy_dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/persistence.py",
>>>>>>>> line 485, in _emit_update_statements
>>>>>>>> [...]
>>>>>>>>
>>>>>>>> UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2018' in
>>>>>>>> position 134: ordinal not in range(256)
>>>>>>>>
>>>>>>>>
>>>>>>>> I am integrating a set of R/Bioconductor modules into our local Galaxy
>>>>>>>> instance.
>>>>>>>> To do so, I use the discard_stderr_wrapper.sh.
>>>>>>>> It worked fine until the recent update*
>>>>>>>> As the error appears upon any R-output (via print, cat or error 
>>>>>>>> channel), I
>>>>>>>> just set the option "-v" for the cat command in the
>>>>>>>> discard_stderr_wrapper.sh:
>>>>>>>>
>>>>>>>> cat $TMPFILE >&2
>>>>>>>> =>
>>>>>>>> cat -v $TMPFILE >&2
>>>>>>>>
>>>>>>>>
>>>>>>>> as a temporary workaround.
>>>>>>>> No idea if this is applicable in your case?!
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Christian
>>>>>>>>
>>>>>>>> *
>>>>>>>> changeset:   11219:5c789ab4144a
>>>>>>>> branch:      stable
>>>>>>>> tag:         tip
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 05.12.2013 17:29, David Hoover wrote:
>>>>>>>>
>>>>>>>> I have installed the ngsplot galaxy tool from
>>>>>>>> http://code.google.com/p/ngsplot.  This tool creates a set of three pdf
>>>>>>>> files.  In older versions of Galaxy, the tool ran correctly with no
>>>>>>>> problems.  A recent update broke the tool.  The job runs but is unable 
>>>>>>>> to
>>>>>>>> finish.  Here is the error reported:
>>>>>>>>
>>>>>>>> Traceback (most recent call last):
>>>>>>>> File "/spin1/users/galaxy/galaxy/lib/galaxy/jobs/runners/local.py", 
>>>>>>>> line
>>>>>>>> 116, in queue_job
>>>>>>>>  job_wrapper.finish( stdout, stderr, exit_code )
>>>>>>>> File "/spin1/users/galaxy/galaxy/lib/galaxy/jobs/__init__.py", line 
>>>>>>>> 1015,
>>>>>>>> in finish
>>>>>>>>  self.sa_session.flush()
>>>>>>>> File "build/bdist.linux-x86_64/egg/sqlalchemy/orm/scoping.py", line 
>>>>>>>> 114,
>>>>>>>> in do
>>>>>>>>  return getattr(self.registry(), name)(*args, **kwargs)
>>>>>>>> File "build/bdist.linux-x86_64/egg/sqlalchemy/orm/session.py", line 
>>>>>>>> 1718,
>>>>>>>> in flush
>>>>>>>>  self._flush(objects)
>>>>>>>> File "build/bdist.linux-x86_64/egg/sqlalchemy/orm/session.py", line 
>>>>>>>> 1789,
>>>>>>>> in _flush
>>>>>>>>  flush_context.execute()
>>>>>>>> File "build/bdist.linux-x86_64/egg/sqlalchemy/orm/unitofwork.py", line
>>>>>>>> 331, in execute
>>>>>>>>  rec.execute(self)
>>>>>>>> File "build/bdist.linux-x86_64/egg/sqlalchemy/orm/unitofwork.py", line
>>>>>>>> 475, in execute
>>>>>>>>  uow
>>>>>>>> File "build/bdist.linux-x86_64/egg/sqlalchemy/orm/persistence.py", line
>>>>>>>> 59, in save_obj
>>>>>>>>  mapper, table, update)
>>>>>>>> File "build/bdist.linux-x86_64/egg/sqlalchemy/orm/persistence.py", line
>>>>>>>> 485, in _emit_update_statements
>>>>>>>>  execute(statement, params)
>>>>>>>> File "build/bdist.linux-x86_64/egg/sqlalchemy/engine/base.py", line 
>>>>>>>> 1449,
>>>>>>>> in execute
>>>>>>>>  params)
>>>>>>>> File "build/bdist.linux-x86_64/egg/sqlalchemy/engine/base.py", line 
>>>>>>>> 1584,
>>>>>>>> in _execute_clauseelement
>>>>>>>>  compiled_sql, distilled_params
>>>>>>>> File "build/bdist.linux-x86_64/egg/sqlalchemy/engine/base.py", line 
>>>>>>>> 1691,
>>>>>>>> in _execute_context
>>>>>>>>  context)
>>>>>>>> File "build/bdist.linux-x86_64/egg/sqlalchemy/engine/default.py", line
>>>>>>>> 331, in do_execute
>>>>>>>>  cursor.execute(statement, parameters)
>>>>>>>> File "build/bdist.linux-x86_64/egg/MySQLdb/cursors.py", line 158, in
>>>>>>>> execute
>>>>>>>>  query = query % db.literal(args)
>>>>>>>> File "build/bdist.linux-x86_64/egg/MySQLdb/connections.py", line 265, 
>>>>>>>> in
>>>>>>>> literal
>>>>>>>>  return self.escape(o, self.encoders)
>>>>>>>> File "build/bdist.linux-x86_64/egg/MySQLdb/connections.py", line 203, 
>>>>>>>> in
>>>>>>>> unicode_literal
>>>>>>>>  return db.literal(u.encode(unicode_literal.charset))
>>>>>>>> UnicodeEncodeError: 'latin-1' codec can't encode character u'\ufffd' in
>>>>>>>> position 11: ordinal not in range(256)
>>>>>>>>
>>>>>>>>
>>>>>>>> There is a set of files created in the job_working_directory that 
>>>>>>>> start with
>>>>>>>> 'metadata_', some of which contain the unicode.
>>>>>>>>
>>>>>>>> Is there anything I can do to fix this?
>>>>>>>>
>>>>>>>> David Hoover
>>>>>>>> Helix Systems Staff
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ___________________________________________________________
>>>>>>>> Please keep all replies on the list by using "reply all"
>>>>>>>> in your mail client.  To manage your subscriptions to this
>>>>>>>> and other Galaxy lists, please use the interface at:
>>>>>>>> http://lists.bx.psu.edu/
>>>>>>>>
>>>>>>>> To search Galaxy mailing lists use the unified search at:
>>>>>>>> http://galaxyproject.org/search/mailinglists/
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ___________________________________________________________
>>>>>>>> Please keep all replies on the list by using "reply all"
>>>>>>>> in your mail client.  To manage your subscriptions to this
>>>>>>>> and other Galaxy lists, please use the interface at:
>>>>>>>> http://lists.bx.psu.edu/
>>>>>>>>
>>>>>>>> To search Galaxy mailing lists use the unified search at:
>>>>>>>> http://galaxyproject.org/search/mailinglists/
>>>>>> ___________________________________________________________
>>>>>> Please keep all replies on the list by using "reply all"
>>>>>> in your mail client.  To manage your subscriptions to this
>>>>>> and other Galaxy lists, please use the interface at:
>>>>>> http://lists.bx.psu.edu/
>>>>>>
>>>>>> To search Galaxy mailing lists use the unified search at:
>>>>>> http://galaxyproject.org/search/mailinglists/
>>>>>
>>>
>

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Reply via email to