[jira] [Created] (SQOOP-3264) Import JDBC SQL date,time,timestamp to Hive as TIMESTAMP, BIGINT and TIMESTAMP

Michal Klempa (JIRA) Tue, 28 Nov 2017 02:34:07 -0800

Michal Klempa created SQOOP-3264:
------------------------------------

             Summary: Import JDBC SQL date,time,timestamp to Hive as TIMESTAMP, 
BIGINT and TIMESTAMP
                 Key: SQOOP-3264
                 URL: https://issues.apache.org/jira/browse/SQOOP-3264
             Project: Sqoop
          Issue Type: Improvement
          Components: hive-integration
    Affects Versions: 1.4.6
            Reporter: Michal Klempa
            Priority: Minor
             Fix For: 1.4.7



When importing JDBC SQL  Types:
{code}
        public final static int DATE            =  91;
        public final static int TIME            =  92;
        public final static int TIMESTAMP       =  93;
{code}
Sqoop currently uses the org.apache.sqoop.hive.HiveTypes.toHiveType method, 
where all of these types are mapped to STRING type.

Given that in fact, the JDBC value returned is of type Long, let me propose we 
can output the type for Hive as:
{code}
DATE -> TIMESTAMP
TIME -> BIGINT
TIMESTAMP -> TIMESTAMP
{code}

This is also in line with org.apache.sqoop.manager.ConnManager.toAvroType, 
where the type is 
{code}
    case Types.DATE:
    case Types.TIME:
    case Types.TIMESTAMP:
      return Type.LONG;
{code}

Some of the connectors override the toJavaType:
{code}
org.apache.sqoop.manager.SQLServerManager
org.apache.sqoop.manager.oracle.OraOopConnManager
{code}
which may indicate different handling.

The SQLServerManager uses Java String as the output type, because of timezones.
Same holds true for OraOopConnManager, although it has a separate configuration 
boolean value 
'oraoop.timestamp.string' which controls whether the import will use timezones 
and convert date types
to Java String, or timezones are going to be dropped and import will behave the 
'sqoop way'.

Both of these connectors already handle these types as String by default, 
proposed change would not affect them.
Other connectors are needed to be checked.

Some of the connectors override the toHiveType:
{code}
org.apache.sqoop.manager.oracle.OraOopConnManager
{code}
This connector uses the 'sqoop way':
{code}
    String hiveType = super.toHiveType(sqlType);
{code}
and only when not resolved, the type used is decided:
{code}
    if (hiveType == null) {

      // http://wiki.apache.org/hadoop/Hive/Tutorial#Primitive_Types

      if (sqlType == OraOopOracleQueries.getOracleType("BFILE")
          || sqlType == OraOopOracleQueries.getOracleType("INTERVALYM")
          || sqlType == OraOopOracleQueries.getOracleType("INTERVALDS")
          || sqlType == OraOopOracleQueries.getOracleType("NCLOB")
          || sqlType == OraOopOracleQueries.getOracleType("NCHAR")
          || sqlType == OraOopOracleQueries.getOracleType("NVARCHAR")
          || sqlType == OraOopOracleQueries.getOracleType("OTHER")
          || sqlType == OraOopOracleQueries.getOracleType("ROWID")
          || sqlType == OraOopOracleQueries.getOracleType("TIMESTAMPTZ")
          || sqlType == OraOopOracleQueries.getOracleType("TIMESTAMPLTZ")
          || sqlType == OraOopOracleQueries.getOracleType("STRUCT")) {
        hiveType = "STRING";
      }

      if (sqlType == OraOopOracleQueries.getOracleType("BINARY_FLOAT")) {
        hiveType = "FLOAT";
      }

      if (sqlType == OraOopOracleQueries.getOracleType("BINARY_DOUBLE")) {
        hiveType = "DOUBLE";
      }
    }
{code}
This code is affected with proposed change. As the Hive TIMESTAMP is 
timezone-less, we have to change the handling in this method - respect the 
property 'oraoop.timestamp.string' - if true, output STRING hive type, if 
false, go with 'sqoop way'.

The Hive Type is only used when generating the table ddl (create statement) and 
Hive can properly recognize the  JDBC compliant java.sql.Timestamp format 
"YYYY-MM-DD HH:MM:SS.fffffffff", so no connector should be affected in a way, 
that Hive would not read the resulting column values.

However, thorough testing should be done on all connectors before releasing any 
column type behavior changes.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (SQOOP-3264) Import JDBC SQL date,time,timestamp to Hive as TIMESTAMP, BIGINT and TIMESTAMP

Reply via email to