xabriel opened a new pull request #640: Allow ordered projections when writing URL: https://github.com/apache/incubator-iceberg/pull/640 Having table schema `{ id : Int, data: String }` We want to be able to: ``` spark.read .format("iceberg") .load(...) .select("id") .write .format("iceberg") .mode("append") .save(...) ``` We were getting: ``` java.lang.AssertionError: index (1) should < 1 at org.apache.spark.sql.catalyst.expressions.UnsafeRow.assertIndexIsValid(UnsafeRow.java:131) at org.apache.spark.sql.catalyst.expressions.UnsafeRow.isNullAt(UnsafeRow.java:352) at org.apache.spark.sql.catalyst.expressions.UnsafeRow.get(UnsafeRow.java:308) at org.apache.iceberg.spark.data.SparkParquetWriters$InternalRowWriter.get(SparkParquetWriters.java:471) at org.apache.iceberg.spark.data.SparkParquetWriters$InternalRowWriter.get(SparkParquetWriters.java:453) at org.apache.iceberg.parquet.ParquetValueWriters$StructWriter.write(ParquetValueWriters.java:444) at org.apache.iceberg.parquet.ParquetWriter.add(ParquetWriter.java:110) at org.apache.iceberg.spark.source.Writer$BaseWriter.writeInternal(Writer.java:388) at org.apache.iceberg.spark.source.Writer$UnpartitionedWriter.write(Writer.java:472) at org.apache.iceberg.spark.source.Writer$UnpartitionedWriter.write(Writer.java:455) at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$$anonfun$run$3.apply(WriteToDataSourceV2Exec.scala:118) at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$$anonfun$run$3.apply(WriteToDataSourceV2Exec.scala:116) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1394) at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:146) at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:67) at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:66) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:123) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) ``` The stack points to Iceberg expecting `UnsafeRow` to match `table.schema`. With this PR, we bubble down the write schema so writes of ordered projections are allowed.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
