[GitHub] [iceberg] bartosz25 opened a new issue, #5065: The check-ordering purpose

GitBox Thu, 16 Jun 2022 08:33:28 -0700


bartosz25 opened a new issue, #5065:
URL: https://github.com/apache/iceberg/issues/5065


   Hi, 
   
   I'm learning Iceberg and am struggling with the `check-ordering` option. As 
far as I understood, it 
   > Checks if input schema and table schema are same
   https://iceberg.apache.org/docs/latest/spark-configuration/#write-options 
   
   I got the test case from the repo...
   
https://github.com/apache/iceberg/blob/2531545e3cd3b97494c9e3c137cfe04f4459a9fb/spark/v3.2/spark/src/test/java/org/apache/iceberg/spark/source/TestPartitionValues.java
   
   ... and used it to play a bit with the option:
   
   ```
       sparkSession.sql(
         """
           |CREATE OR REPLACE  TABLE local.db.letters (
           |  id STRING NOT NULL,
           |  letter1 STRING NOT NULL,
           |  letter2 STRING NOT NULL
           |) USING iceberg
           |""".stripMargin)
   
       sparkSession.sql("SELECT '2' AS id, 'a' AS letter1, 'A' AS letter2")
         .select("letter2", "id" , "letter1") // This is not necessary but I 
left it for simpler columns ordering and to follow the example from the unit 
test
         .write
         .option("check-ordering", "true").insertInto("local.db.letters")
   ```
   I was expecting to see the code failing with the `check-ordering` enabled 
and the reordered columns, but it succeeded with the position-based insert:
   
   ```
   +---+-------+-------+
   |id |letter1|letter2|
   +---+-------+-------+
   |A  |2      |a      |
   +---+-------+-------+
   ```
   
   Thinking it's my local issue (Iceberg 0.13.1, Spark 3.2.0), I cloned the 
project repo 
   
   * extended the aforementioned unit test by 
   ```
       spark.read()
               .format("iceberg")
               .option(SparkReadOptions.VECTORIZATION_ENABLED, 
String.valueOf(vectorized))
               .load(location.toString()).show();
   ```
   
   * changed the check-ordering flag to true:
   ```
       df.select("data", "id").write()
               .format("iceberg")
               .mode(SaveMode.Append)
               .option(SparkWriteOptions.CHECK_ORDERING, "true")
               .save(location.toString());
   ```
   
   Surprisingly, the print returned correct results but the `check-ordering` 
flag seems having no effect on the writer. The operation works when it's 
enabled and disabled. 
   
   For sure, I'm missing something. Can you shed some light on it? Is my 
understanding of this `check-ordering` correct or wrong? If so, do you have any 
minimal reproducible code showing the insert broken because of the reordered 
`select(...)` and the `check-ordering` enabled?
   
   Thank you.
   Best,
   Bartosz.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] bartosz25 opened a new issue, #5065: The check-ordering purpose

Reply via email to