[GitHub] [iceberg] zhangdove commented on issue #1780: hive cli execute sql query hadoopcatalog table occurred timestamp type cast exception

GitBox Tue, 24 Nov 2020 00:35:11 -0800


zhangdove commented on issue #1780:
URL: https://github.com/apache/iceberg/issues/1780#issuecomment-732741938



   The environment:
   ```
   hive-2.3.7
   iceberg-0.10.0
   spark-3.0.0
   ```
   
   1. Spark Conf
   ```scala
       val spark = SparkSession
         .builder()
         .master("local[2]")
         .appName("IcebergAPI")
         .config("spark.sql.catalog.hadoop_prod", 
"org.apache.iceberg.spark.SparkCatalog")
         .config("spark.sql.catalog.hadoop_prod.type", "hadoop")
         .config("spark.sql.catalog.hadoop_prod.warehouse", 
"file:///Users/dovezhang/iceberg/warehouse")
         .getOrCreate()
       val conf: Configuration = new Configuration()
       val catalog: HadoopCatalog = new HadoopCatalog(conf, 
"file:///Users/dovezhang/iceberg/warehouse")
       val nameSpace = Namespace.of(schemaName)
       val tableIdentifier: TableIdentifier = TableIdentifier.of(nameSpace, 
tableName)
   ```
   2. Spark Create Table:
   ```scala
     def createPartitionTable(catalog: HadoopCatalog, tableIdentifier: 
TableIdentifier): Unit = {
       val columns: List[Types.NestedField] = new ArrayList[Types.NestedField]
       columns.add(Types.NestedField.of(1, true, "id", Types.IntegerType.get, 
"id doc"))
       columns.add(Types.NestedField.of(2, true, "name", Types.StringType.get, 
"name doc"))
       columns.add(Types.NestedField.of(3, true, "time", 
Types.TimestampType.withZone(), "create time doc"))
   
       val schema: Schema = new Schema(columns)
       val partition = PartitionSpec.unpartitioned()
   
       val table = catalog.createTable(tableIdentifier, schema, partition)
     }
   ```
   3. Spark write data
   ```scala
     case class StructedDb(id: Int, name: String, time: Long)
   
     def writeDataToIcebergHdfs1(spark: SparkSession): Unit = {
       val seq = Seq(StructedDb(1, "doveHDFS", 1594014000L), // 2020-07-06 
13:40:00
         StructedDb(2, "IcebergNameDoveHDFS", 1594017000L), // 2020-07-06 
14:30:00
         StructedDb(3, "SparkHDFS", 1594020000L)) // 2020-07-06 15:20:00
       val structedTbDf = spark.createDataFrame(seq).toDF("id", "name", 
"timeLong")
   
       val df = structedTbDf
         .withColumn("time", structedTbDf("timeLong").cast(TimestampType))
         .drop("timeLong")
   
       import org.apache.spark.sql.functions
       
df.writeTo(s"hadoop_prod.${schemaName}.${tableName}").overwrite(functions.lit(true))
     }
   ```
   
   4. Hive Client Query:
   ```SQL
   0: jdbc:hive2://localhost:10000> add jar 
/Users/dovezhang/software/idea/github/iceberg/hive-runtime/build/libs/iceberg-hive-runtime-0.10.0.jar;
   0: jdbc:hive2://localhost:10000> create database testDb;
   0: jdbc:hive2://localhost:10000> use testDb;
   0: jdbc:hive2://localhost:10000> CREATE EXTERNAL TABLE testTb
   . . . . . . . . . . . . . . . .> STORED BY 
'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
   . . . . . . . . . . . . . . . .> LOCATION 
'file:///Users/dovezhang/iceberg/warehouse/testDb/testTb';
   0: jdbc:hive2://localhost:10000> select * from testDb.testTb;
   +------------+----------------------+------------------------+
   | testtb.id  |     testtb.name      |      testtb.time       |
   +------------+----------------------+------------------------+
   | 1          | doveHDFS             | 2020-07-06 05:40:00.0  |
   | 2          | IcebergNameDoveHDFS  | 2020-07-06 06:30:00.0  |
   | 3          | SparkHDFS            | 2020-07-06 07:20:00.0  |
   +------------+----------------------+------------------------+
   0: jdbc:hive2://localhost:10000> select count(*) from testDb.testTb;
   +------+
   | _c0  |
   +------+
   | 3    |
   +------+
   1 row selected (27.854 seconds)
   ```
   
   I have done a test here, and there is no problem. Is your TimestampData 
customized? Perhaps you can provide more detailed operation?
   
   Good Luck!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] zhangdove commented on issue #1780: hive cli execute sql query hadoopcatalog table occurred timestamp type cast exception

Reply via email to