tdEmpl064 opened a new issue #13491:
URL: https://github.com/apache/superset/issues/13491


   ### Motivation
   
   In attempting to use Apache Superset with Teradata, we observed the 
following:
   
   1. **[Topic 1]** The connection instructions provided at 
https://superset.apache.org/docs/databases/teradata can be updated to the 
latest and most efficient way to connect to an Advanced SQL Engine (formerly 
Teradata Database) of a [Teradata 
Vantage](https://docs.teradata.com/r/3AkrVQlhjJMha4KRVJmm1w/root) system.
   2. **[Topic 2]** Most critically, with the current Superset codebase (as of 
apache-superset v.1.0.1) an error occurs when attempting to perform a simple 
SELECT query in the SQL Lab; see Figure 1 below. Namely, observe in Figure 1 
the "Syntax Error: expected something between the word '<table_Name>' and the 
'LIMIT' keyword". The root cause for this error is the improper translation to 
Teradata SQL of the 'LIMIT' keyword in the Superset back-end. Teradata SQL uses 
the 'TOP' or 'SAMPLE' keywords instead of 'LIMIT'. Observe that issue #11405 
appears to be closely related to this issue.
   
   Figure 1
   
![Figure_1](https://user-images.githubusercontent.com/60898895/110174175-973ee980-7db4-11eb-8309-0b4c38dc4973.jpg)
   
   The current SIP is a proposal to update Superset connectivity to Teradata 
Vantage systems, and to correct the way the 'LIMIT' keyword in the Superset 
back-end SQL is interpreted by Teradata SQL. It is likely that the latter fix 
might also provide a fix for issue #11405, too.
   
   ### Proposed Changes
   
   #### For **Topic 1**, the following is proposed:
   
   We recommend updating the current (as of March 5, 2021) Superset Teradata 
connection webpage at https://superset.apache.org/docs/databases/teradata so 
that
   * The new [teradatasqlalchemy](https://pypi.org/project/teradatasqlalchemy) 
package is used instead of the dated 
[teradata-sqlalchemy](https://pypi.org/project/sqlalchemy-teradata) project.
   * The more efficient [teradatasql](https://pypi.org/project/teradatasql) 
driver is used instead of ODBC for Teradata Vantage connections.
   
   This proposal simplifies Teradata connections, by rendering as obsolete the 
previously necessary steps to download the Teradata ODBC drivers and to specify 
corresponding environment variables. The current proposal only requires 
installation of the teradatasqlalchemy package, as teradatasql is a dependency 
of teradatasqlalchemy and is satisfied upon installation of teradatasqlalchemy. 
Alternatively, if users have already installed the teradataml package (that is, 
the Teradata client package for Python) on their client computer, then no 
further action is needed.
   
   In all, the proposal is summarized into the single action of replacing the 
existing text on the page https://superset.apache.org/docs/databases/teradata 
with the following text:
   
   > Teradata
   > 
   > The recommended connector library is 
[teradatasqlalchemy](https://pypi.org/project/teradatasqlalchemy).
   > The connection string for Teradata looks like this:
   > `teradatasql://{user}:{password}@{host}`
   
   A sample connection string to a Teradata Vantage system by using 
teradatasqlalchemy with the teradatasql driver is shown in the Figure 2 below.
   
   Figure 2
   
![Figure_2](https://user-images.githubusercontent.com/60898895/110174452-fc92da80-7db4-11eb-841a-bc2e94bfbb54.jpg)
   
   #### For **Topic 2**, the following is proposed:
   
   My colleague David Chan identified and implemented what needs to be done in 
the existing Superset code, so that the 'LIMIT' keyword is correctly identified 
and translated to Teradata SQL, as follows:
   
   * Identify whether the parsing SQL is Teradata SQL.
   * When the 'LIMIT' keyword is encountered, translate appropriately the query 
sent to the Teradata Vantage server by using the 'TOP' or 'SAMPLE' keywords in 
the Teradata SQL query.
   
   The proposed solution is **non-destructive**, in that it introduces new code 
that is executed only upon identifying a connection as a Teradata connection. 
The corresponding fix is provided in the modified versions of the files 
"db_engine_specs/base.py", "sql_lab.py", and "sql_parse.py", which are included 
in the file "SIP_Teradata.zip" at the bottom of the present SIP. The listing of 
differences between the original files and the proposed fix files is detailed 
in the text file "codeDifferences.txt", which is also included in the file 
"SIP_Teradata.zip" at the bottom of the present SIP.
   
   By using the fix files in place of the original ones, the issue identified 
above in Figure 1 is resolved as illustrated in the following Figure 3.
   
   Figure 3
   
![Figure_3](https://user-images.githubusercontent.com/60898895/110174586-2c41e280-7db5-11eb-8c23-a9ffa1226686.jpg)
   
   ### New or Changed Public Interfaces
   
   The present proposal requires no new or changed public interfaces.
   
   ### New dependencies
   
   Only for Teradata connections, the present proposal requires installation of
   
   * the [teradatasqlalchemy](https://pypi.org/project/teradatasqlalchemy) 
connector, and
   * the [teradatasql](https://pypi.org/project/teradatasql) driver
   
   packages to connect to a Teradata Vantage system Advanced SQL Engine 
Database.
   These packages are actively maintained and their licenses are Teradata 
proprietary. The teradatasql license can be found at:
   https://github.com/Teradata/python-driver/blob/master/LICENSE
   whereas the teradatasqlalchemy code is not publicly posted at this point.
   
   ### Migration Plan and Compatibility
   
   No migration is necessary. The proposed code changes are an extension and 
non-destructive to the existing code, thus not affecting existing compatibility.
   
   ### Rejected Alternatives
   
   No alternative approaches have been considered or rejected.
   
   ### Attachments
   
   
[SIP_Teradata.zip](https://github.com/apache/superset/files/6093455/SIP_Teradata.zip)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to