geoHeil opened a new issue, #19866:
URL: https://github.com/apache/datafusion/issues/19866

   ### Describe the bug
   
   When using deltalake.DeltaTable.delete(predicate=...) with a
     predicate generated for the DataFusion dialect, Delta errors
     with:
   
   
   ```
   DeltaError: Generic DeltaTable error: Internal error: arrow_cast
   should have been simplified to cast
   ```
   
   See the raw SQL which is causing the problem
   
   ```sql
   SELECT
     *
   FROM "t" AS "t0"
   WHERE
     "t0"."created_at" = ARROW_CAST('2024-01-01 00:00:00+00:00', 
'Timestamp(Microsecond, Some("UTC"))')
   Predicate: "created_at" = ARROW_CAST('2024-01-01 00:00:00+00:00', 
'Timestamp(Microsecond, Some("UTC"))')
   ```
   
   ### To Reproduce
   
     Minimal repro (Python):
   
   verified for deltalake 1.3.2
   
   ```python
   from __future__ import annotations
   
   from datetime import datetime, timedelta, timezone
   from pathlib import Path
   import shutil
   import re
   
   import polars as pl
   import sqlglot
   from deltalake import DeltaTable, write_deltalake
   
   import ibis
   
   
   def extract_where(sql: str, *, dialect: str | None) -> str:
       expr = sqlglot.parse_one(sql, read=dialect)
       where = expr.args.get("where")
       if where is None:
           raise RuntimeError("No WHERE clause found")
       # Strip table qualifiers for a single-table predicate
       for col in where.find_all(sqlglot.exp.Column):
           if col.table:
               col.set("table", None)
       predicate = where.sql(dialect=dialect) if dialect else where.sql()
       predicate = re.sub(r"^WHERE\s+", "", predicate, flags=re.IGNORECASE)
       return predicate
   
   
   def main() -> None:
       root = Path("/tmp/delta_arrow_cast_datafusion").resolve()
       table_uri = str(root)
       shutil.rmtree(root, ignore_errors=True)
   
       base_time = datetime(2024, 1, 1, tzinfo=timezone.utc)
       newer_time = base_time + timedelta(minutes=5)
   
       df = pl.DataFrame(
           {
               "sample_uid": ["a", "a"],
               "value": [1, 2],
               "created_at": [base_time, newer_time],
           }
       )
   
       write_deltalake(table_uri, df, mode="overwrite")
   
       schema = ibis.schema({
           "sample_uid": "string",
           "value": "int64",
           "created_at": "timestamp('UTC')",
       })
       t = ibis.table(schema, name="t")
       filtered = t.filter(t.created_at == base_time)
       sql = ibis.to_sql(filtered, dialect="postgres")
       predicate = extract_where(sql, dialect="postgres")
       print("DataFusion SQL:", sql)
       print("Predicate:", predicate)
   
       table = DeltaTable(table_uri)
       table.delete(predicate=predicate)
   
   
   if __name__ == "__main__":
       main()
   
   ```
   
   ### Expected behavior
   
   perform the deletion without parsing error
   
   ### Additional context
   
   Switching the dialect to postgres by changing from
   
   ```
   predicate = extract_where(sql, dialect="datafusion")
   ```
   
   to
   
   ```
   predicate = extract_where(sql, dialect="postgres")
   ```
   
   is a viable workaround for now - but feels wrong.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to