[
https://issues.apache.org/jira/browse/AIRAVATA-3697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17736268#comment-17736268
]
Marcus Christie edited comment on AIRAVATA-3697 at 6/26/23 8:33 PM:
--------------------------------------------------------------------
I'm using mariadb-tools to do an online conversion of a large instance of
replica_catalog. First, I'm doing a dry run on a copy of the database:
{code:java}
MariaDB> create database replica_catalog_copy DEFAULT CHARACTER SET utf8
COLLATE utf8_bin;
{code}
{code:java}
mysqldump replica_catalog > replica_catalog.sql
mysql replica_catalog_copy < replica_catalog.sql
{code}
Need to remove the foreign keys for this to work
{code:java}
ALTER TABLE DATA_PRODUCT DROP FOREIGN KEY DATA_PRODUCT_ibfk_1;
ALTER TABLE DATA_PRODUCT_METADATA DROP FOREIGN KEY DATA_PRODUCT_METADATA_ibfk_1;
ALTER TABLE DATA_REPLICA_LOCATION DROP FOREIGN KEY DATA_REPLICA_LOCATION_ibfk_1;
ALTER TABLE DATA_REPLICA_METADATA DROP FOREIGN KEY DATA_REPLICA_METADATA_ibfk_1;
{code}
Running mariadb-tools to do the incremental, online character set conversion
{code:java}
mariadb-schema-change --alter "CONVERT TO CHARACTER SET utf8 COLLATE utf8_bin"
D=replica_catalog_copy,t=DATA_PRODUCT --execute
mariadb-schema-change --alter "CONVERT TO CHARACTER SET utf8 COLLATE utf8_bin"
D=replica_catalog_copy,t=DATA_PRODUCT_METADATA --execute
mariadb-schema-change --alter "CONVERT TO CHARACTER SET utf8 COLLATE utf8_bin"
D=replica_catalog_copy,t=DATA_REPLICA_LOCATION --execute
mariadb-schema-change --alter "CONVERT TO CHARACTER SET utf8 COLLATE utf8_bin"
D=replica_catalog_copy,t=DATA_REPLICA_METADATA --execute
mariadb-schema-change --alter "CONVERT TO CHARACTER SET utf8 COLLATE utf8_bin"
D=replica_catalog_copy,t=CONFIGURATION --execute
{code}
Using mariadb-schema-change to add back constraints:
{code:java}
mariadb-schema-change --alter "ADD CONSTRAINT DATA_PRODUCT_ibfk_1 FOREIGN KEY
(PARENT_PRODUCT_URI) REFERENCES DATA_PRODUCT (PRODUCT_URI) ON DELETE CASCADE"
D=replica_catalog_copy,t=DATA_PRODUCT --execute
mariadb-schema-change --alter "ADD CONSTRAINT DATA_PRODUCT_METADATA_ibfk_1
FOREIGN KEY (PRODUCT_URI) REFERENCES DATA_PRODUCT (PRODUCT_URI) ON DELETE
CASCADE" D=replica_catalog_copy,t=DATA_PRODUCT_METADATA --execute
mariadb-schema-change --alter "ADD CONSTRAINT DATA_REPLICA_LOCATION_ibfk_1
FOREIGN KEY (PRODUCT_URI) REFERENCES DATA_PRODUCT (PRODUCT_URI) ON DELETE
CASCADE" D=replica_catalog_copy,t=DATA_REPLICA_LOCATION --execute
mariadb-schema-change --alter "ADD CONSTRAINT DATA_REPLICA_METADATA_ibfk_1
FOREIGN KEY (REPLICA_ID) REFERENCES DATA_REPLICA_LOCATION (REPLICA_ID) ON
DELETE CASCADE" D=replica_catalog_copy,t=DATA_REPLICA_METADATA --execute
{code}
was (Author: marcuschristie):
I'm using mariadb-tools to do an online conversion of a large instance of
replica_catalog. First, I'm doing a dry run on a copy of the database:
{code}
MariaDB> create database replica_catalog_copy DEFAULT CHARACTER SET utf8
COLLATE utf8_bin;
{code}
{code}
mysqldump replica_catalog > replica_catalog.sql
mysql replica_catalog_copy < replica_catalog.sql
{code}
Need to remove the foreign keys for this to work
{code}
ALTER TABLE DATA_PRODUCT DROP FOREIGN KEY DATA_PRODUCT_ibfk_1;
ALTER TABLE DATA_PRODUCT_METADATA DROP FOREIGN KEY DATA_PRODUCT_METADATA_ibfk_1;
ALTER TABLE DATA_REPLICA_LOCATION DROP FOREIGN KEY DATA_REPLICA_LOCATION_ibfk_1;
ALTER TABLE DATA_REPLICA_METADATA DROP FOREIGN KEY DATA_REPLICA_METADATA_ibfk_1;
{code}
Running mariadb-tools to do the incremental, online character set conversion
{code}
mariadb-schema-change --alter "CONVERT TO CHARACTER SET utf8 COLLATE utf8_bin"
D=replica_catalog_copy,t=DATA_PRODUCT --execute
mariadb-schema-change --alter "CONVERT TO CHARACTER SET utf8 COLLATE utf8_bin"
D=replica_catalog_copy,t=DATA_PRODUCT_METADATA --execute
mariadb-schema-change --alter "CONVERT TO CHARACTER SET utf8 COLLATE utf8_bin"
D=replica_catalog_copy,t=DATA_REPLICA_LOCATION --execute
mariadb-schema-change --alter "CONVERT TO CHARACTER SET utf8 COLLATE utf8_bin"
D=replica_catalog_copy,t=DATA_REPLICA_METADATA --execute
mariadb-schema-change --alter "CONVERT TO CHARACTER SET utf8 COLLATE utf8_bin"
D=replica_catalog_copy,t=CONFIGURATION --execute
{code}
Using mariadb-schema-change to add back constraints:
{code}
mariadb-schema-change --alter "ADD CONSTRAINT DATA_PRODUCT_ibfk_1 FOREIGN KEY
(PARENT_PRODUCT_URI) REFERENCES DATA_PRODUCT (PRODUCT_URI) ON DELETE CASCADE"
D=replica_catalog_copy,t=DATA_PRODUCT --execute
mariadb-schema-change --alter "ADD CONSTRAINT DATA_PRODUCT_METADATA_ibfk_1
FOREIGN KEY (PRODUCT_URI) REFERENCES DATA_PRODUCT (PRODUCT_URI) ON DELETE
CASCADE" D=replica_catalog_copy,t=DATA_PRODUCT_METADATA
--executemariadb-schema-change --alter "ADD CONSTRAINT
DATA_REPLICA_LOCATION_ibfk_1 FOREIGN KEY (PRODUCT_URI) REFERENCES DATA_PRODUCT
(PRODUCT_URI) ON DELETE CASCADE" D=replica_catalog_copy,t=DATA_REPLICA_LOCATION
--execute
mariadb-schema-change --alter "ADD CONSTRAINT DATA_REPLICA_METADATA_ibfk_1
FOREIGN KEY (REPLICA_ID) REFERENCES DATA_REPLICA_LOCATION (REPLICA_ID) ON
DELETE CASCADE" D=replica_catalog_copy,t=DATA_REPLICA_METADATA --execute
{code}
> Handle unicode characters in file names
> ---------------------------------------
>
> Key: AIRAVATA-3697
> URL: https://issues.apache.org/jira/browse/AIRAVATA-3697
> Project: Airavata
> Issue Type: Bug
> Components: Airavata API, Django Portal
> Reporter: Marcus Christie
> Assignee: Marcus Christie
> Priority: Major
> Attachments: Photophysical properties and photodegradation mechanism
> of 2-(2′-furanyl)-1H-benzimidazole (Fuberidazole) - ScienceDirect.html
>
>
> Getting the following Django error when browsing experiment output files with
> unicode characters in the file names:
> {code}
> Traceback (most recent call last):
> File
> "/var/www/portals/django-seagrid/venv/lib64/python3.6/site-packages/rest_framework/views.py",
> line 506, in dispatch
> response = handler(request, *args, **kwargs)
> File
> "/var/www/portals/django-seagrid/airavata-django-portal/django_airavata/apps/api/views.py",
> line 1467, in get
> return self._create_response(request, experiment_id, path)
> File
> "/var/www/portals/django-seagrid/airavata-django-portal/django_airavata/apps/api/views.py",
> line 1471, in _create_response
> directories, files = user_storage.list_experiment_dir(request,
> experiment_id, path)
> File
> "/var/www/portals/django-seagrid/venv/lib64/python3.6/site-packages/airavata_django_portal_sdk/user_storage/api.py",
> line 629, in list_experiment_dir
> backend=backend, owner=experiment.userName)
> File
> "/var/www/portals/django-seagrid/venv/lib64/python3.6/site-packages/airavata_django_portal_sdk/user_storage/api.py",
> line 760, in _get_data_product_uri
> if user_file.exists():
> File
> "/var/www/portals/django-seagrid/venv/lib64/python3.6/site-packages/django/db/models/query.py",
> line 808, in exists
> return self.query.has_results(using=self.db)
> File
> "/var/www/portals/django-seagrid/venv/lib64/python3.6/site-packages/django/db/models/sql/query.py",
> line 561, in has_results
> return compiler.has_results()
> File
> "/var/www/portals/django-seagrid/venv/lib64/python3.6/site-packages/django/db/models/sql/compiler.py",
> line 1145, in has_results
> return bool(self.execute_sql(SINGLE))
> File
> "/var/www/portals/django-seagrid/venv/lib64/python3.6/site-packages/django/db/models/sql/compiler.py",
> line 1175, in execute_sql
> cursor.execute(sql, params)
> File
> "/var/www/portals/django-seagrid/venv/lib64/python3.6/site-packages/django/db/backends/utils.py",
> line 66, in execute
> return self._execute_with_wrappers(sql, params, many=False,
> executor=self._execute)
> File
> "/var/www/portals/django-seagrid/venv/lib64/python3.6/site-packages/django/db/backends/utils.py",
> line 75, in _execute_with_wrappers
> return executor(sql, params, many, context)
> File
> "/var/www/portals/django-seagrid/venv/lib64/python3.6/site-packages/django/db/backends/utils.py",
> line 84, in _execute
> return self.cursor.execute(sql, params)
> File
> "/var/www/portals/django-seagrid/venv/lib64/python3.6/site-packages/django/db/backends/mysql/base.py",
> line 73, in execute
> return self.cursor.execute(query, args)
> File
> "/var/www/portals/django-seagrid/venv/lib64/python3.6/site-packages/MySQLdb/cursors.py",
> line 199, in execute
> args = tuple(map(db.literal, args))
> File
> "/var/www/portals/django-seagrid/venv/lib64/python3.6/site-packages/MySQLdb/connections.py",
> line 280, in literal
> s = self.string_literal(o.encode(self.encoding))
> During handling of the above exception ('utf-8' codec can't encode characters
> in position 197-199: surrogates not allowed), another exception occurred:
> File
> "/var/www/portals/django-seagrid/venv/lib64/python3.6/site-packages/django/core/handlers/exception.py",
> line 47, in inner
> response = get_response(request)
> File
> "/var/www/portals/django-seagrid/venv/lib64/python3.6/site-packages/django/core/handlers/base.py",
> line 181, in _get_response
> response = wrapped_callback(request, *callback_args, **callback_kwargs)
> File
> "/var/www/portals/django-seagrid/venv/lib64/python3.6/site-packages/django/views/decorators/csrf.py",
> line 54, in wrapped_view
> return view_func(*args, **kwargs)
> File
> "/var/www/portals/django-seagrid/venv/lib64/python3.6/site-packages/django/views/generic/base.py",
> line 70, in view
> return self.dispatch(request, *args, **kwargs)
> File
> "/var/www/portals/django-seagrid/venv/lib64/python3.6/site-packages/rest_framework/views.py",
> line 509, in dispatch
> response = self.handle_exception(exc)
> File
> "/var/www/portals/django-seagrid/venv/lib64/python3.6/site-packages/rest_framework/views.py",
> line 466, in handle_exception
> response = exception_handler(exc, context)
> File
> "/var/www/portals/django-seagrid/airavata-django-portal/django_airavata/apps/api/exceptions.py",
> line 54, in custom_exception_handler
> log.error("API exception", exc_info=exc, extra={'request':
> context['request']})
> File "/usr/lib64/python3.6/logging/__init__.py", line 1337, in error
> self._log(ERROR, msg, args, **kwargs)
> File "/usr/lib64/python3.6/logging/__init__.py", line 1444, in _log
> self.handle(record)
> File "/usr/lib64/python3.6/logging/__init__.py", line 1454, in handle
> self.callHandlers(record)
> File "/usr/lib64/python3.6/logging/__init__.py", line 1516, in callHandlers
> hdlr.handle(record)
> File "/usr/lib64/python3.6/logging/__init__.py", line 865, in handle
> self.emit(record)
> File
> "/var/www/portals/django-seagrid/venv/lib64/python3.6/site-packages/django/utils/log.py",
> line 122, in emit
> self.send_mail(subject, message, fail_silently=True,
> html_message=html_message)
> File
> "/var/www/portals/django-seagrid/venv/lib64/python3.6/site-packages/django/utils/log.py",
> line 125, in send_mail
> mail.mail_admins(subject, message, *args, connection=self.connection(),
> **kwargs)
> File
> "/var/www/portals/django-seagrid/venv/lib64/python3.6/site-packages/django/core/mail/__init__.py",
> line 104, in mail_admins
> mail.send(fail_silently=fail_silently)
> File
> "/var/www/portals/django-seagrid/venv/lib64/python3.6/site-packages/django/core/mail/message.py",
> line 284, in send
> return self.get_connection(fail_silently).send_messages([self])
> File
> "/var/www/portals/django-seagrid/venv/lib64/python3.6/site-packages/django/core/mail/backends/smtp.py",
> line 109, in send_messages
> sent = self._send(message)
> File
> "/var/www/portals/django-seagrid/venv/lib64/python3.6/site-packages/django/core/mail/backends/smtp.py",
> line 123, in _send
> message = email_message.message()
> File
> "/var/www/portals/django-seagrid/venv/lib64/python3.6/site-packages/django/core/mail/message.py",
> line 247, in message
> msg = self._create_message(msg)
> File
> "/var/www/portals/django-seagrid/venv/lib64/python3.6/site-packages/django/core/mail/message.py",
> line 436, in _create_message
> return self._create_attachments(self._create_alternatives(msg))
> File
> "/var/www/portals/django-seagrid/venv/lib64/python3.6/site-packages/django/core/mail/message.py",
> line 446, in _create_alternatives
> msg.attach(self._create_mime_attachment(*alternative))
> File
> "/var/www/portals/django-seagrid/venv/lib64/python3.6/site-packages/django/core/mail/message.py",
> line 361, in _create_mime_attachment
> attachment = SafeMIMEText(content, subtype, encoding)
> File
> "/var/www/portals/django-seagrid/venv/lib64/python3.6/site-packages/django/core/mail/message.py",
> line 159, in __init__
> MIMEText.__init__(self, _text, _subtype=_subtype, _charset=_charset)
> File "/usr/lib64/python3.6/email/mime/text.py", in __init__
> File
> "/var/www/portals/django-seagrid/venv/lib64/python3.6/site-packages/django/core/mail/message.py",
> line 169, in set_payload
> for line in payload.splitlines()
> File
> "/var/www/portals/django-seagrid/venv/lib64/python3.6/site-packages/django/core/mail/message.py",
> line 169, in <genexpr>
> for line in payload.splitlines()
> Exception Type: UnicodeEncodeError at
> /api/experiment-storage/CDE-Extract_on_Dec_5,_2022_2:59_PM_0f641c8b-63ac-4281-981e-176ef47c820d/ARCHIVE/ScrapeData/completed
> Exception Value: 'utf-8' codec can't encode characters in position 70-72:
> surrogates not allowed
> {code}
> There are a couple different issues here. First, the Django server thinks the
> file system encoding is 'ascii'. That can be fixed by setting the language
> and locale of the Python process to en_US.UTF8 on the WSGIDaemonProcess
> (https://modwsgi.readthedocs.io/en/master/configuration-directives/WSGIDaemonProcess.html?highlight=lang#wsgidaemonprocess).
> The second issue is on the API server. The replica_catalog tables use a
> character encoding of latin1 so they are not able to store file names or
> paths with unicode characters. This can be fixed by converting these tables
> to 'utf8' character set.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)