SergeiPatiakin opened a new issue, #15039:
URL: https://github.com/apache/iceberg/issues/15039
### Apache Iceberg version
1.9.2
### Query engine
Spark
### Please describe the bug 🐞
## Repro 1
### Steps
- Create an unpartitioned table with the following schema:
key BIGINT NOT NULL,
inserted_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP(),
revenue DECIMAL(15, 3),
comment STRING
- Populate table with data including some equality deletes
- Set maxExecutors=1 to ensure all jobs go to the same executor
- Run the following queries:
```
SELECT
date_trunc('year', inserted_at) AS year,
COUNT(*) AS count
FROM table1
GROUP BY 1 ORDER BY 1 ASC;
SELECT
1,
COUNT(*) AS count
FROM transactional_table_analytics_reader
GROUP BY 1 ORDER BY 1 ASC;
```
### Observed behavior
- Results for the second query are incorrect. The incorrect result-set is
consistent with equality deletes being ignored.
## Repro 2
- Same as repro 1 but switch the two queries:
```
SELECT
1,
COUNT(*) AS count
FROM transactional_table_analytics_reader
GROUP BY 1 ORDER BY 1 ASC;
SELECT
date_trunc('year', inserted_at) AS year,
COUNT(*) AS count
FROM table1
GROUP BY 1 ORDER BY 1 ASC;
```
### Observed behavior
- Results for the second query are incorrect. The incorrect result-set is
consistent with equality deletes being ignored.
## Non-repros
- No repro if `spark.sql.iceberg.executor-cache.enabled` is set to `false`
- No repro if the same query is executed repeatedly
## Comments
I have low familiarity with the Iceberg codebase, but could it be because
BaseDeleteLoader.getOrReadEqDeletes uses deleteFile.location() as a key
regardless of projection?
https://github.com/apache/iceberg/blob/7f81e1e93084e50fa3676c2e131722f66a26b385/data/src/main/java/org/apache/iceberg/data/BaseDeleteLoader.java#L121
Should projection somehow be part of the cache key there?
### Willingness to contribute
- [ ] I can contribute a fix for this bug independently
- [x] I would be willing to contribute a fix for this bug with guidance from
the Iceberg community
- [ ] I cannot contribute a fix for this bug at this time
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]