zhangdove opened a new issue #1354:
URL: https://github.com/apache/iceberg/issues/1354


   My Local time zone is Asia/Shanghai(CTT).
   I create an iceberg table and use the `day()` function to use the timestamp 
column as the partition column.
   
   The code to build the table and write the data is as follows:
   ```scala
     def createPartitionTable(catalog: HadoopCatalog, tableIdentifier: 
TableIdentifier): Unit = {
       val columns: List[Types.NestedField] = new ArrayList[Types.NestedField]
       columns.add(Types.NestedField.of(1, true, "id", Types.IntegerType.get, 
"id doc"))
       columns.add(Types.NestedField.of(2, true, "ts", 
Types.TimestampType.withZone(), "ts doc"))
   
       val schema: Schema = new Schema(columns)
       val partition = PartitionSpec.builderFor(schema).day("ts", "day").build()
   
       val table = catalog.createTable(tableIdentifier, schema, partition)
     }
     // CTT : Asia/Shanghai GMT+8
     // UTC : GMT+0
     def writeData(spark: SparkSession, TimeZoneId: String): Unit = {
       TimeZone.setDefault(TimeZone.getTimeZone(TimeZoneId))
       val seq = Seq(
         Tb(1, Timestamp.valueOf("2020-01-01 04:00:00")),
         Tb(2, Timestamp.valueOf("2020-01-01 11:00:00")))
   
       val df = spark.createDataFrame(seq).toDF("id", "ts")
   
       df.writeTo("prod.db.table").overwritePartitions()
     }
   ```
   
   The test sample: When writing data using a non-Greenway time zone, a day's 
data is written to both partitions
   ```scala
       createPartitionTable(catalog, tableIdentifier)
       writeData(spark, "CTT")
   ```
   The directory structure is as follows
   ```bash
   ➜  db tree ./table
   ./table
   ├── data
   │   ├── day=2019-12-31
   │   │   └── 00000-0-5a329b00-cbe5-4ea5-977e-9a609589d40a-00001.parquet
   │   └── day=2020-01-01
   │       └── 00000-0-5a329b00-cbe5-4ea5-977e-9a609589d40a-00002.parquet
   └── metadata
       ├── b5299545-2ce0-49e1-b0f1-8bc87cb6562a-m0.avro
       ├── snap-2523494143240605377-1-b5299545-2ce0-49e1-b0f1-8bc87cb6562a.avro
       ├── v1.metadata.json
       ├── v2.metadata.json
       └── version-hint.text
   ```
   This causes the `2019-12-31` data file to overwrite while writing the 
`2020-01-01` data.
   
   I was wondering if we needed to fix the time zone issue that caused the data 
exception. This phenomenon also occurs in day/month/year partitions.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to