bartosz25 opened a new issue, #5065: URL: https://github.com/apache/iceberg/issues/5065
Hi, I'm learning Iceberg and am struggling with the `check-ordering` option. As far as I understood, it > Checks if input schema and table schema are same https://iceberg.apache.org/docs/latest/spark-configuration/#write-options I got the test case from the repo... https://github.com/apache/iceberg/blob/2531545e3cd3b97494c9e3c137cfe04f4459a9fb/spark/v3.2/spark/src/test/java/org/apache/iceberg/spark/source/TestPartitionValues.java ... and used it to play a bit with the option: ``` sparkSession.sql( """ |CREATE OR REPLACE TABLE local.db.letters ( | id STRING NOT NULL, | letter1 STRING NOT NULL, | letter2 STRING NOT NULL |) USING iceberg |""".stripMargin) sparkSession.sql("SELECT '2' AS id, 'a' AS letter1, 'A' AS letter2") .select("letter2", "id" , "letter1") // This is not necessary but I left it for simpler columns ordering and to follow the example from the unit test .write .option("check-ordering", "true").insertInto("local.db.letters") ``` I was expecting to see the code failing with the `check-ordering` enabled and the reordered columns, but it succeeded with the position-based insert: ``` +---+-------+-------+ |id |letter1|letter2| +---+-------+-------+ |A |2 |a | +---+-------+-------+ ``` Thinking it's my local issue (Iceberg 0.13.1, Spark 3.2.0), I cloned the project repo * extended the aforementioned unit test by ``` spark.read() .format("iceberg") .option(SparkReadOptions.VECTORIZATION_ENABLED, String.valueOf(vectorized)) .load(location.toString()).show(); ``` * changed the check-ordering flag to true: ``` df.select("data", "id").write() .format("iceberg") .mode(SaveMode.Append) .option(SparkWriteOptions.CHECK_ORDERING, "true") .save(location.toString()); ``` Surprisingly, the print returned correct results but the `check-ordering` flag seems having no effect on the writer. The operation works when it's enabled and disabled. For sure, I'm missing something. Can you shed some light on it? Is my understanding of this `check-ordering` correct or wrong? If so, do you have any minimal reproducible code showing the insert broken because of the reordered `select(...)` and the `check-ordering` enabled? Thank you. Best, Bartosz. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
