[GitHub] [iceberg] zhangdove edited a comment on issue #1780: hive cli execute sql query hadoopcatalog table occurred timestamp type cast exception

GitBox Tue, 24 Nov 2020 00:45:09 -0800


zhangdove edited a comment on issue #1780:
URL: https://github.com/apache/iceberg/issues/1780#issuecomment-732741938



   The environment:
   ```
   hive-2.3.7
   iceberg-0.10.0
   spark-3.0.0
   ```
   
   1. Spark Conf
   ```scala
       val spark = SparkSession
         .builder()
         .master("local[2]")
         .appName("IcebergAPI")
         .config("spark.sql.catalog.hadoop_prod", 
"org.apache.iceberg.spark.SparkCatalog")
         .config("spark.sql.catalog.hadoop_prod.type", "hadoop")
         .config("spark.sql.catalog.hadoop_prod.warehouse", 
"file:///Users/dovezhang/iceberg/warehouse")
         .getOrCreate()
       val conf: Configuration = new Configuration()
       val catalog: HadoopCatalog = new HadoopCatalog(conf, 
"file:///Users/dovezhang/iceberg/warehouse")
       val nameSpace = Namespace.of(schemaName)
       val tableIdentifier: TableIdentifier = TableIdentifier.of(nameSpace, 
tableName)
   ```
   2. Spark Create Table:
   ```scala
     def createPartitionTable(catalog: HadoopCatalog, tableIdentifier: 
TableIdentifier): Unit = {
       val columns: List[Types.NestedField] = new ArrayList[Types.NestedField]
       columns.add(Types.NestedField.of(1, true, "id", Types.IntegerType.get, 
"id doc"))
       columns.add(Types.NestedField.of(2, true, "name", Types.StringType.get, 
"name doc"))
       columns.add(Types.NestedField.of(3, true, "time", 
Types.TimestampType.withZone(), "create time doc"))
   
       val schema: Schema = new Schema(columns)
       val partition = PartitionSpec.unpartitioned()
   
       val table = catalog.createTable(tableIdentifier, schema, partition)
     }
   ```
   3. Spark write data
   ```scala
     case class DbTb(id: Int, name: String, time: Timestamp)
   
     def writeDataToIcebergHdfs(spark: SparkSession): Unit = {
       val seq = Seq(DbTb(1, "doveHDFS", Timestamp.valueOf("2020-07-06 
13:40:00")),
         DbTb(2, "IcebergNameDoveHDFS", Timestamp.valueOf("2020-07-06 
14:30:00")),
         DbTb(3, "SparkHDFS", Timestamp.valueOf("2020-07-06 15:20:00")))
       val structedTbDf = spark.createDataFrame(seq).toDF("id", "name", 
"timeLong")
   
       import org.apache.spark.sql.functions
       
structedTbDf.writeTo(s"hadoop_prod.${schemaName}.${tableName}").overwrite(functions.lit(true))
     }
   ```
   
   4. Hive Client Query:
   ```SQL
   0: jdbc:hive2://localhost:10000> add jar 
/Users/dovezhang/software/idea/github/iceberg/hive-runtime/build/libs/iceberg-hive-runtime-0.10.0.jar;
   0: jdbc:hive2://localhost:10000> create database testDb;
   0: jdbc:hive2://localhost:10000> use testDb;
   0: jdbc:hive2://localhost:10000> CREATE EXTERNAL TABLE testTb
   . . . . . . . . . . . . . . . .> STORED BY 
'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
   . . . . . . . . . . . . . . . .> LOCATION 
'file:///Users/dovezhang/iceberg/warehouse/testDb/testTb';
   0: jdbc:hive2://localhost:10000> select * from testDb.testTb;
   +------------+----------------------+------------------------+
   | testtb.id  |     testtb.name      |      testtb.time       |
   +------------+----------------------+------------------------+
   | 1          | doveHDFS             | 2020-07-06 05:40:00.0  |
   | 2          | IcebergNameDoveHDFS  | 2020-07-06 06:30:00.0  |
   | 3          | SparkHDFS            | 2020-07-06 07:20:00.0  |
   +------------+----------------------+------------------------+
   0: jdbc:hive2://localhost:10000> select count(*) from testDb.testTb;
   +------+
   | _c0  |
   +------+
   | 3    |
   +------+
   1 row selected (27.854 seconds)
   ```
   
   I have done a test here, and there is no problem. Is your TimestampData 
customized? Perhaps you can provide more detailed operation?
   
   Good Luck!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] zhangdove edited a comment on issue #1780: hive cli execute sql query hadoopcatalog table occurred timestamp type cast exception

Reply via email to