gbates101 opened a new issue #5250: Add support for hosting superset DB in 
greenplum
URL: https://github.com/apache/incubator-superset/issues/5250
 
 
   - [x] I have checked the superset logs for python stacktraces and included 
it here as text if any
   - [x] I have reproduced the issue with at least the latest released version 
of superset
   - [x] I have checked the issue tracker for the same issue and I haven't 
found one similar
   
   
   ### Superset version
   4b7a14de (building from source)
   
   ### Expected results
   Running database migrations against a greenplum database creates all 
database objects successfully.
   
   ### Actual results
   Some components of the current superset DB schema are incompatible with a 
greenplum database, particularly the DDL for creating tables.
   
   ### Steps to reproduce
   In the below example, I ran `superset db upgrade` against a greenplum 
database, and recieved an error informing me that `Greenplum Database does not 
allow having both PRIMARY KEY and UNIQUE constraints`. The root issue here is 
that a primary key IS a unique constraint in greenplum, and only one unique 
constraint can exist on any given table. I have four options to rectify this, 
the first three requiring schema changes:
   - Drop the primary key in favor of the unique constraint. 
       - This would still allow only 1 unique constraint per table.
   - Drop the unique constraint altogether.
       - This would probably have unintended consequences.
   - Add the column(s) with the unique constraint(s) to the PK instead.
       - This would lose the individual uniqueness of each column.
   - Modify the sqlalchemy dialect for greenplum to ignore unique constraints 
if a PK is defined.
       - This will probably be the least invasive, but may still have 
unintended consequences.
   
   Assuming I can take one of the options above without complication, I can 
think of two options for adding GP-specific migrations:
   - Create a branch from the root migration in superset's migration repo, and 
start fresh for a GP installation.
       - This branch could be labeled as "greenplum" or something useful. I 
believe this would require users to specify which branch they want to migrate 
from when running `superset db upgrade`.
   - Squash the existing migrations, and switch to the flask-multidb template.
       - This will require all consumers define a SQLALCHEMY_BINDS in their 
config, mapping an engine to a URL.
   
   I believe adding support for GP will also require model changes for 
flask-appbuilder's security models. The `SecurityManager` will [create 
tables](https://github.com/dpgaspar/Flask-AppBuilder/blob/98b1be8b3390cd592dc20f215062e55d27e08eec/flask_appbuilder/security/sqla/manager.py#L75)
 from FAB's base model if the tables do not exist (this circumstance is how I 
reproduced the example below). It may be useful to define overrides for the FAB 
security models that do not have unique columns defined, though this seems like 
extra work that could be avoided with a greenplum dialect change.
   
   All-in-all it seems feasible to add support for hosting superset in 
greenplum, but I don't yet have it all figured out. What do you think?
   
   ### Example
   Ran migrations on an empty greenplum database with a database role that has 
full DDL access:
   ```bash
   $ superset db upgrade
   ...
   INFO:sqlalchemy.engine.base.Engine:{'name': 'ab_permission_id_seq'}
   INFO:sqlalchemy.engine.base.Engine:
   CREATE TABLE ab_permission (
           id INTEGER NOT NULL, 
           name VARCHAR(100) NOT NULL, 
           PRIMARY KEY (id), 
           UNIQUE (name)
   )
   
   
   INFO:sqlalchemy.engine.base.Engine:{}
   INFO:sqlalchemy.engine.base.Engine:ROLLBACK
   ERROR:flask_appbuilder.security.sqla.manager:DB Creation and initialization 
failed: (psycopg2.ProgrammingError) Greenplum Database does not allow having 
both PRIMARY KEY and UNIQUE constraints
    [SQL: '\nCREATE TABLE ab_permission (\n\tid INTEGER NOT NULL, \n\tname 
VARCHAR(100) NOT NULL, \n\tPRIMARY KEY (id), \n\tUNIQUE (name)\n)\n\n'] 
(Background on this error at: http://sqlalche.me/e/f405)
   ```
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to