[ https://issues.apache.org/jira/browse/SQOOP-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Attila Szabo updated SQOOP-3264: -------------------------------- Fix Version/s: (was: 1.4.7) 1.5.0 > Import JDBC SQL date,time,timestamp to Hive as TIMESTAMP, BIGINT and TIMESTAMP > ------------------------------------------------------------------------------ > > Key: SQOOP-3264 > URL: https://issues.apache.org/jira/browse/SQOOP-3264 > Project: Sqoop > Issue Type: Improvement > Components: hive-integration > Affects Versions: 1.4.6 > Reporter: Michal Klempa > Priority: Minor > Fix For: 1.5.0 > > > When importing JDBC SQL Types: > {code} > public final static int DATE = 91; > public final static int TIME = 92; > public final static int TIMESTAMP = 93; > {code} > Sqoop currently uses the org.apache.sqoop.hive.HiveTypes.toHiveType method, > where all of these types are mapped to STRING type. > Given that in fact, the JDBC value returned is of type Long, let me propose > we can output the type for Hive as: > {code} > DATE -> TIMESTAMP > TIME -> BIGINT > TIMESTAMP -> TIMESTAMP > {code} > This is also in line with org.apache.sqoop.manager.ConnManager.toAvroType, > where the type is > {code} > case Types.DATE: > case Types.TIME: > case Types.TIMESTAMP: > return Type.LONG; > {code} > Some of the connectors override the toJavaType: > {code} > org.apache.sqoop.manager.SQLServerManager > org.apache.sqoop.manager.oracle.OraOopConnManager > {code} > which may indicate different handling. > The SQLServerManager uses Java String as the output type, because of > timezones. > Same holds true for OraOopConnManager, although it has a separate > configuration boolean value > 'oraoop.timestamp.string' which controls whether the import will use > timezones and convert date types > to Java String, or timezones are going to be dropped and import will behave > the 'sqoop way'. > Both of these connectors already handle these types as String by default, > proposed change would not affect them. > Other connectors are needed to be checked. > Some of the connectors override the toHiveType: > {code} > org.apache.sqoop.manager.oracle.OraOopConnManager > {code} > This connector uses the 'sqoop way': > {code} > String hiveType = super.toHiveType(sqlType); > {code} > and only when not resolved, the type used is decided: > {code} > if (hiveType == null) { > // http://wiki.apache.org/hadoop/Hive/Tutorial#Primitive_Types > if (sqlType == OraOopOracleQueries.getOracleType("BFILE") > || sqlType == OraOopOracleQueries.getOracleType("INTERVALYM") > || sqlType == OraOopOracleQueries.getOracleType("INTERVALDS") > || sqlType == OraOopOracleQueries.getOracleType("NCLOB") > || sqlType == OraOopOracleQueries.getOracleType("NCHAR") > || sqlType == OraOopOracleQueries.getOracleType("NVARCHAR") > || sqlType == OraOopOracleQueries.getOracleType("OTHER") > || sqlType == OraOopOracleQueries.getOracleType("ROWID") > || sqlType == OraOopOracleQueries.getOracleType("TIMESTAMPTZ") > || sqlType == OraOopOracleQueries.getOracleType("TIMESTAMPLTZ") > || sqlType == OraOopOracleQueries.getOracleType("STRUCT")) { > hiveType = "STRING"; > } > if (sqlType == OraOopOracleQueries.getOracleType("BINARY_FLOAT")) { > hiveType = "FLOAT"; > } > if (sqlType == OraOopOracleQueries.getOracleType("BINARY_DOUBLE")) { > hiveType = "DOUBLE"; > } > } > {code} > This code is affected with proposed change. As the Hive TIMESTAMP is > timezone-less, we have to change the handling in this method - respect the > property 'oraoop.timestamp.string' - if true, output STRING hive type, if > false, go with 'sqoop way'. > The Hive Type is only used when generating the table ddl (create statement) > and Hive can properly recognize the JDBC compliant java.sql.Timestamp format > "YYYY-MM-DD HH:MM:SS.fffffffff", so no connector should be affected in a way, > that Hive would not read the resulting column values. > However, thorough testing should be done on all connectors before releasing > any column type behavior changes. -- This message was sent by Atlassian JIRA (v6.4.14#64029)