tdEmpl064 opened a new issue #13491: URL: https://github.com/apache/superset/issues/13491
### Motivation In attempting to use Apache Superset with Teradata, we observed the following: 1. **[Topic 1]** The connection instructions provided at https://superset.apache.org/docs/databases/teradata can be updated to the latest and most efficient way to connect to an Advanced SQL Engine (formerly Teradata Database) of a [Teradata Vantage](https://docs.teradata.com/r/3AkrVQlhjJMha4KRVJmm1w/root) system. 2. **[Topic 2]** Most critically, with the current Superset codebase (as of apache-superset v.1.0.1) an error occurs when attempting to perform a simple SELECT query in the SQL Lab; see Figure 1 below. Namely, observe in Figure 1 the "Syntax Error: expected something between the word '<table_Name>' and the 'LIMIT' keyword". The root cause for this error is the improper translation to Teradata SQL of the 'LIMIT' keyword in the Superset back-end. Teradata SQL uses the 'TOP' or 'SAMPLE' keywords instead of 'LIMIT'. Observe that issue #11405 appears to be closely related to this issue. Figure 1  The current SIP is a proposal to update Superset connectivity to Teradata Vantage systems, and to correct the way the 'LIMIT' keyword in the Superset back-end SQL is interpreted by Teradata SQL. It is likely that the latter fix might also provide a fix for issue #11405, too. ### Proposed Changes #### For **Topic 1**, the following is proposed: We recommend updating the current (as of March 5, 2021) Superset Teradata connection webpage at https://superset.apache.org/docs/databases/teradata so that * The new [teradatasqlalchemy](https://pypi.org/project/teradatasqlalchemy) package is used instead of the dated [teradata-sqlalchemy](https://pypi.org/project/sqlalchemy-teradata) project. * The more efficient [teradatasql](https://pypi.org/project/teradatasql) driver is used instead of ODBC for Teradata Vantage connections. This proposal simplifies Teradata connections, by rendering as obsolete the previously necessary steps to download the Teradata ODBC drivers and to specify corresponding environment variables. The current proposal only requires installation of the teradatasqlalchemy package, as teradatasql is a dependency of teradatasqlalchemy and is satisfied upon installation of teradatasqlalchemy. Alternatively, if users have already installed the teradataml package (that is, the Teradata client package for Python) on their client computer, then no further action is needed. In all, the proposal is summarized into the single action of replacing the existing text on the page https://superset.apache.org/docs/databases/teradata with the following text: > Teradata > > The recommended connector library is [teradatasqlalchemy](https://pypi.org/project/teradatasqlalchemy). > The connection string for Teradata looks like this: > `teradatasql://{user}:{password}@{host}` A sample connection string to a Teradata Vantage system by using teradatasqlalchemy with the teradatasql driver is shown in the Figure 2 below. Figure 2  #### For **Topic 2**, the following is proposed: My colleague David Chan identified and implemented what needs to be done in the existing Superset code, so that the 'LIMIT' keyword is correctly identified and translated to Teradata SQL, as follows: * Identify whether the parsing SQL is Teradata SQL. * When the 'LIMIT' keyword is encountered, translate appropriately the query sent to the Teradata Vantage server by using the 'TOP' or 'SAMPLE' keywords in the Teradata SQL query. The proposed solution is **non-destructive**, in that it introduces new code that is executed only upon identifying a connection as a Teradata connection. The corresponding fix is provided in the modified versions of the files "db_engine_specs/base.py", "sql_lab.py", and "sql_parse.py", which are included in the file "SIP_Teradata.zip" at the bottom of the present SIP. The listing of differences between the original files and the proposed fix files is detailed in the text file "codeDifferences.txt", which is also included in the file "SIP_Teradata.zip" at the bottom of the present SIP. By using the fix files in place of the original ones, the issue identified above in Figure 1 is resolved as illustrated in the following Figure 3. Figure 3  ### New or Changed Public Interfaces The present proposal requires no new or changed public interfaces. ### New dependencies Only for Teradata connections, the present proposal requires installation of * the [teradatasqlalchemy](https://pypi.org/project/teradatasqlalchemy) connector, and * the [teradatasql](https://pypi.org/project/teradatasql) driver packages to connect to a Teradata Vantage system Advanced SQL Engine Database. These packages are actively maintained and their licenses are Teradata proprietary. The teradatasql license can be found at: https://github.com/Teradata/python-driver/blob/master/LICENSE whereas the teradatasqlalchemy code is not publicly posted at this point. ### Migration Plan and Compatibility No migration is necessary. The proposed code changes are an extension and non-destructive to the existing code, thus not affecting existing compatibility. ### Rejected Alternatives No alternative approaches have been considered or rejected. ### Attachments [SIP_Teradata.zip](https://github.com/apache/superset/files/6093455/SIP_Teradata.zip) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
