geoHeil opened a new issue, #19866:
URL: https://github.com/apache/datafusion/issues/19866
### Describe the bug
When using deltalake.DeltaTable.delete(predicate=...) with a
predicate generated for the DataFusion dialect, Delta errors
with:
```
DeltaError: Generic DeltaTable error: Internal error: arrow_cast
should have been simplified to cast
```
See the raw SQL which is causing the problem
```sql
SELECT
*
FROM "t" AS "t0"
WHERE
"t0"."created_at" = ARROW_CAST('2024-01-01 00:00:00+00:00',
'Timestamp(Microsecond, Some("UTC"))')
Predicate: "created_at" = ARROW_CAST('2024-01-01 00:00:00+00:00',
'Timestamp(Microsecond, Some("UTC"))')
```
### To Reproduce
Minimal repro (Python):
verified for deltalake 1.3.2
```python
from __future__ import annotations
from datetime import datetime, timedelta, timezone
from pathlib import Path
import shutil
import re
import polars as pl
import sqlglot
from deltalake import DeltaTable, write_deltalake
import ibis
def extract_where(sql: str, *, dialect: str | None) -> str:
expr = sqlglot.parse_one(sql, read=dialect)
where = expr.args.get("where")
if where is None:
raise RuntimeError("No WHERE clause found")
# Strip table qualifiers for a single-table predicate
for col in where.find_all(sqlglot.exp.Column):
if col.table:
col.set("table", None)
predicate = where.sql(dialect=dialect) if dialect else where.sql()
predicate = re.sub(r"^WHERE\s+", "", predicate, flags=re.IGNORECASE)
return predicate
def main() -> None:
root = Path("/tmp/delta_arrow_cast_datafusion").resolve()
table_uri = str(root)
shutil.rmtree(root, ignore_errors=True)
base_time = datetime(2024, 1, 1, tzinfo=timezone.utc)
newer_time = base_time + timedelta(minutes=5)
df = pl.DataFrame(
{
"sample_uid": ["a", "a"],
"value": [1, 2],
"created_at": [base_time, newer_time],
}
)
write_deltalake(table_uri, df, mode="overwrite")
schema = ibis.schema({
"sample_uid": "string",
"value": "int64",
"created_at": "timestamp('UTC')",
})
t = ibis.table(schema, name="t")
filtered = t.filter(t.created_at == base_time)
sql = ibis.to_sql(filtered, dialect="postgres")
predicate = extract_where(sql, dialect="postgres")
print("DataFusion SQL:", sql)
print("Predicate:", predicate)
table = DeltaTable(table_uri)
table.delete(predicate=predicate)
if __name__ == "__main__":
main()
```
### Expected behavior
perform the deletion without parsing error
### Additional context
Switching the dialect to postgres by changing from
```
predicate = extract_where(sql, dialect="datafusion")
```
to
```
predicate = extract_where(sql, dialect="postgres")
```
is a viable workaround for now - but feels wrong.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]