zhangdove edited a comment on issue #1780:
URL: https://github.com/apache/iceberg/issues/1780#issuecomment-732741938
The environment:
```
hive-2.3.7
iceberg-0.10.0
spark-3.0.0
```
1. Spark Conf
```scala
val spark = SparkSession
.builder()
.master("local[2]")
.appName("IcebergAPI")
.config("spark.sql.catalog.hadoop_prod",
"org.apache.iceberg.spark.SparkCatalog")
.config("spark.sql.catalog.hadoop_prod.type", "hadoop")
.config("spark.sql.catalog.hadoop_prod.warehouse",
"file:///Users/dovezhang/iceberg/warehouse")
.getOrCreate()
val conf: Configuration = new Configuration()
val catalog: HadoopCatalog = new HadoopCatalog(conf,
"file:///Users/dovezhang/iceberg/warehouse")
val nameSpace = Namespace.of(schemaName)
val tableIdentifier: TableIdentifier = TableIdentifier.of(nameSpace,
tableName)
```
2. Spark Create Table:
```scala
def createPartitionTable(catalog: HadoopCatalog, tableIdentifier:
TableIdentifier): Unit = {
val columns: List[Types.NestedField] = new ArrayList[Types.NestedField]
columns.add(Types.NestedField.of(1, true, "id", Types.IntegerType.get,
"id doc"))
columns.add(Types.NestedField.of(2, true, "name", Types.StringType.get,
"name doc"))
columns.add(Types.NestedField.of(3, true, "time",
Types.TimestampType.withZone(), "create time doc"))
val schema: Schema = new Schema(columns)
val partition = PartitionSpec.unpartitioned()
val table = catalog.createTable(tableIdentifier, schema, partition)
}
```
3. Spark write data
```scala
case class DbTb(id: Int, name: String, time: Timestamp)
def writeDataToIcebergHdfs(spark: SparkSession): Unit = {
val seq = Seq(DbTb(1, "doveHDFS", Timestamp.valueOf("2020-07-06
13:40:00")),
DbTb(2, "IcebergNameDoveHDFS", Timestamp.valueOf("2020-07-06
14:30:00")),
DbTb(3, "SparkHDFS", Timestamp.valueOf("2020-07-06 15:20:00")))
val structedTbDf = spark.createDataFrame(seq).toDF("id", "name",
"timeLong")
import org.apache.spark.sql.functions
structedTbDf.writeTo(s"hadoop_prod.${schemaName}.${tableName}").overwrite(functions.lit(true))
}
```
4. Hive Client Query:
```SQL
0: jdbc:hive2://localhost:10000> add jar
/Users/dovezhang/software/idea/github/iceberg/hive-runtime/build/libs/iceberg-hive-runtime-0.10.0.jar;
0: jdbc:hive2://localhost:10000> create database testDb;
0: jdbc:hive2://localhost:10000> use testDb;
0: jdbc:hive2://localhost:10000> CREATE EXTERNAL TABLE testTb
. . . . . . . . . . . . . . . .> STORED BY
'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
. . . . . . . . . . . . . . . .> LOCATION
'file:///Users/dovezhang/iceberg/warehouse/testDb/testTb';
0: jdbc:hive2://localhost:10000> select * from testDb.testTb;
+------------+----------------------+------------------------+
| testtb.id | testtb.name | testtb.time |
+------------+----------------------+------------------------+
| 1 | doveHDFS | 2020-07-06 05:40:00.0 |
| 2 | IcebergNameDoveHDFS | 2020-07-06 06:30:00.0 |
| 3 | SparkHDFS | 2020-07-06 07:20:00.0 |
+------------+----------------------+------------------------+
0: jdbc:hive2://localhost:10000> select count(*) from testDb.testTb;
+------+
| _c0 |
+------+
| 3 |
+------+
1 row selected (27.854 seconds)
```
I have done a test here, and there is no problem. Is your TimestampData
customized? Perhaps you can provide more detailed operation?
Good Luck!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]