rahil-c opened a new issue, #18820: URL: https://github.com/apache/hudi/issues/18820
**What happened:** Using `read_blob()` inside a WHERE predicate fails with: ``` [INTERNAL_ERROR] Cannot generate code for expression: read_blob(...) ``` Example query: ```sql SELECT id FROM t WHERE length(read_blob(image_bytes)) = 11; ``` `read_blob()` works correctly in the SELECT list — only filter predicates trigger the codegen failure. **What you expected:** Two things: 1. The codegen restriction should surface as an analyzer-level rejection with a clear "read_blob() is not supported in filter predicates" message, not an INTERNAL_ERROR with a Spark codegen stack trace. 2. Docs (AI quick start) should call out the recommended workaround: for length-based filtering, filter on the BLOB struct's `.length` subfield from the meta columns (e.g. `WHERE image_bytes.length = 11`) rather than wrapping `read_blob()` in `length(...)`. Typical usage is vector search or filtering on structured columns; pulling raw bytes through codegen in a predicate is not a supported path. **Steps to reproduce:** 1. Use 1.2.0-rc2 Spark bundle. 2. Create a table with a BLOB column `image_bytes` and insert rows. 3. Run: `SELECT id FROM t WHERE length(read_blob(image_bytes)) = 11`. 4. Observe INTERNAL_ERROR. **Environment:** - Hudi version: 1.2.0-rc2 - Query engine: Spark 3.5 - Found during: 1.2.0-rc2 RC voting testing Filed as a follow-up per discussion in the 1.2.0-rc2 voting thread — non-blocker for the release. Separate docs PR will cover the length-filter workaround. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
