[GitHub] [iceberg] zhangdove opened a new issue #1341: Use overwrite API and throw ValidationException

GitBox Fri, 14 Aug 2020 01:00:57 -0700


zhangdove opened a new issue #1341:
URL: https://github.com/apache/iceberg/issues/1341



   1.My environment
   ```
   spark version:3.0.0
   iceberg version: 0.9.0
   ```
   2.My use case
   ```
     def createPartitionTable(catalog: HadoopCatalog, tableIdentifier: 
TableIdentifier): Unit = {
       val columns: List[Types.NestedField] = new ArrayList[Types.NestedField]
       columns.add(Types.NestedField.of(1, true, "id", Types.IntegerType.get, 
"id doc"))
       columns.add(Types.NestedField.of(2, true, "name", Types.StringType.get, 
"name doc"))
       columns.add(Types.NestedField.of(3, true, "time", 
Types.TimestampType.withZone(), "create time doc"))
   
       val schema: Schema = new Schema(columns)
       val partition = PartitionSpec.builderFor(schema).day("time", 
"day").build()
   
       val table = catalog.createTable(tableIdentifier, schema, partition)
     }
   
     def writeData(spark:SparkSession): Unit ={
       import spark.implicits._
       val seq = Seq(StructedDb(1, "v1", Timestamp.valueOf("2020-01-01 
12:00:00")))
       
seq.toDF.writeTo(s"hadoop_prod.${schemaName}.${tableName}").overwrite($"time" 
>= Timestamp.valueOf("2020-01-01 00:00:00"))
   
       val seq2 = Seq(StructedDb(2, "v2", Timestamp.valueOf("2020-01-02 
13:00:00")))
       
seq2.toDF.writeTo(s"hadoop_prod.${schemaName}.${tableName}").overwrite($"time" 
>= Timestamp.valueOf("2020-01-02 00:00:00"))
     }
   
     createPartitionTable(catalog, tableIdentifier)
     writeData(spark)
   ```
   3. Throw Exception
   ```
   Caused by: org.apache.iceberg.exceptions.ValidationException: Cannot delete 
file where some, but not all, rows match filter ref(name="time") >= 
1577894400000000: 
file:/Users/dovezhang/iceberg/warehouse/testDb/testTb/data/day=2020-01-01/00000-0-c8dc7903-fba7-41ed-8c8b-916b7c066ffc-00001.parquet
        at 
org.apache.iceberg.exceptions.ValidationException.check(ValidationException.java:42)
        at 
org.apache.iceberg.ManifestFilterManager.manifestHasDeletedFiles(ManifestFilterManager.java:355)
   ```
   
   When writing the second data, why would iceberg delete the file in the 
partition where the first record is located?
   
   I'm not sure if I'm using it the wrong way, I would appreciate it if someone 
could tell me the right way.
   
   
https://github.com/apache/iceberg/blob/master/site/docs/spark.md#overwriting-data


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] zhangdove opened a new issue #1341: Use overwrite API and throw ValidationException

Reply via email to