xabriel opened a new pull request #640: Allow ordered projections when writing
URL: https://github.com/apache/incubator-iceberg/pull/640
 
 
   Having table schema `{ id : Int, data: String }`
   
   We want to be able to:
   
   ```
   spark.read
     .format("iceberg")
     .load(...)
     .select("id")
     .write
     .format("iceberg")
     .mode("append")
     .save(...)
   ```
   
   
   We were getting:
   ```
   java.lang.AssertionError: index (1) should < 1
        at 
org.apache.spark.sql.catalyst.expressions.UnsafeRow.assertIndexIsValid(UnsafeRow.java:131)
        at 
org.apache.spark.sql.catalyst.expressions.UnsafeRow.isNullAt(UnsafeRow.java:352)
        at 
org.apache.spark.sql.catalyst.expressions.UnsafeRow.get(UnsafeRow.java:308)
        at 
org.apache.iceberg.spark.data.SparkParquetWriters$InternalRowWriter.get(SparkParquetWriters.java:471)
        at 
org.apache.iceberg.spark.data.SparkParquetWriters$InternalRowWriter.get(SparkParquetWriters.java:453)
        at 
org.apache.iceberg.parquet.ParquetValueWriters$StructWriter.write(ParquetValueWriters.java:444)
        at org.apache.iceberg.parquet.ParquetWriter.add(ParquetWriter.java:110)
        at 
org.apache.iceberg.spark.source.Writer$BaseWriter.writeInternal(Writer.java:388)
        at 
org.apache.iceberg.spark.source.Writer$UnpartitionedWriter.write(Writer.java:472)
        at 
org.apache.iceberg.spark.source.Writer$UnpartitionedWriter.write(Writer.java:455)
        at 
org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$$anonfun$run$3.apply(WriteToDataSourceV2Exec.scala:118)
        at 
org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$$anonfun$run$3.apply(WriteToDataSourceV2Exec.scala:116)
        at 
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1394)
        at 
org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:146)
        at 
org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:67)
        at 
org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:66)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:123)
        at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
   ```
   
   The stack points to Iceberg expecting `UnsafeRow` to match `table.schema`.
   
   With this PR, we bubble down the write schema so writes of ordered 
projections are allowed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to