dotjdk opened a new issue, #5625:
URL: https://github.com/apache/iceberg/issues/5625
### Apache Iceberg version
0.14.0 (latest release)
### Query engine
Spark
### Please describe the bug 🐞
I am running a spark structured streaming job reading data from Kafka and
writing to an Iceberg table partitioned by `days(timestamp)`.
When `IcebergSparkSessionExtensions` are enabled, my job fails with
`org.apache.spark.sql.AnalysisException: days(timestamp) ASC NULLS FIRST is not
currently supported`.
The only way I can get it to work is by not registering
`IcebergSparkSessionExtensions` and enabling `fanout-writer`. When I do that,
the data is written to the table, but I get the following entry in the log:
```
2022-08-19 06:02:55 WARN [stream execution thread for Streaming Query [id =
9996dced-e80f-43b6-b241-0533f4df934c, runId =
6b4caf31-db34-4cf1-b88e-8794b49c3a6a]] o.a.i.spark.source.SparkWriteBuilder -
Skipping distribution/ordering: extensions are disabled and spec contains
unsupported transforms
```
When I enable IcebergSparkSessionExtensions I get the following exception
(`fanout-writer` enabled or not):
I couldn’t find a testcase that triggers this with non-identity
partitioning, so I have attached a patch file with a modified version of the
TestStructuredStreaming testcase which runs parameterized variations of fanout
enabled/disabled and extensions registered or not
| **Extensions** | **fanout-writer** | **Result**
|
|----------------|-------------------|-------------------------------------------------------------------------------------|
| disabled | enabled | Pass
|
| disabled | disabled | Fail: Encountered records that belong
to already closed files |
| enabled | enabled | Fail: AnalysisException:
days(timestamp) ASC NULLS FIRST is not currently supported |
| enabled | disabled | Fail: AnalysisException:
days(timestamp) ASC NULLS FIRST is not currently supported |
Patch file with testcase:
[Non-identity_partitioning_broken_without_fanout_writer.patch.zip](https://github.com/apache/iceberg/files/9414772/Non-identity_partitioning_broken_without_fanout_writer.patch.zip)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]