(hudi) branch master updated: docs(examples): pin blob.inline.mode=CONTENT after Lance default flip (#18823)

vhs Wed, 03 Jun 2026 08:13:01 -0700

This is an automated email from the ASF dual-hosted git repository.

voonhous pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git



The following commit(s) were added to refs/heads/master by this push:
     new 95be21048d69 docs(examples): pin blob.inline.mode=CONTENT after Lance 
default flip (#18823)
95be21048d69 is described below

commit 95be21048d6980bbab1806aff46d62205f5d6be1
Author: Rahil C <[email protected]>
AuthorDate: Wed Jun 3 08:12:48 2026 -0700

    docs(examples): pin blob.inline.mode=CONTENT after Lance default flip 
(#18823)
    
    * docs(examples): pin blob.inline.mode=CONTENT after Lance default flip
    
    apache/hudi#18744 flipped Lance's default for `hoodie.read.blob.inline.mode`
    to DESCRIPTOR and added a BatchedBlobReader guard that raises rather than
    silently returns null when `read_blob()` runs against an INLINE row under
    DESCRIPTOR mode. The vector_blob_demo blob-reader script and notebook were
    relying on the prior implicit-CONTENT default for their `read_blob()`
    resolve-view load, which now fails on Lance.
    
    - hudi_blob_reader_demo.py / notebooks/01_blob_reader.ipynb: scope CONTENT
      explicitly on the resolve-view reader (mirrors how show_descriptors()
      already scopes its own mode per-load).
    - notebooks/00_main_demo.ipynb: set CONTENT once on the SparkSession so the
      notebook's "flip the DDL to lance" instruction continues to work.
    - README.md + create_spark() comment: explain the Parquet/Lance default
      split and reference apache/hudi#18744.
    
    Verified with ./run_demos.sh against the 1.2.0-rc2 staging bundle
    (all 10 parquet/lance × blob_reader/sql/dataframe combos green).
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
    
    * docs(examples): switch demos from 1.2.0-rc1 staging jar to official 1.2.0 
Maven Central release
    
    Replace all references to the Apache Nexus staging URL 
(orgapachehudi-1176/1.2.0-rc1)
    with the official Maven Central URL for hudi-spark3.5-bundle_2.12-1.2.0.jar 
across
    all three Python scripts, four notebooks, and both READMEs.
    
    Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
    
    ---------
    
    Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
---
 .../src/test/python/vector_blob_demo/README.md     | 57 +++++++++++++---------
 .../vector_blob_demo/hudi_blob_reader_demo.py      | 50 +++++++++++--------
 .../hudi_dataframe_vector_blob_demo.py             | 16 +++---
 .../vector_blob_demo/hudi_sql_vector_blob_demo.py  | 16 +++---
 .../vector_blob_demo/notebooks/00_main_demo.ipynb  |  9 +++-
 .../notebooks/01_blob_reader.ipynb                 | 45 ++++++++++-------
 .../notebooks/02_sql_vector_search.ipynb           | 14 +++---
 .../notebooks/03_dataframe_vector_search.ipynb     | 10 ++--
 .../python/vector_blob_demo/notebooks/README.md    |  2 +-
 9 files changed, 128 insertions(+), 91 deletions(-)

diff --git 
a/hudi-examples/hudi-examples-spark/src/test/python/vector_blob_demo/README.md 
b/hudi-examples/hudi-examples-spark/src/test/python/vector_blob_demo/README.md
index 11bbd11abd5c..ed1f0bdc94a5 100644
--- 
a/hudi-examples/hudi-examples-spark/src/test/python/vector_blob_demo/README.md
+++ 
b/hudi-examples/hudi-examples-spark/src/test/python/vector_blob_demo/README.md
@@ -71,28 +71,22 @@ See [`notebooks/README.md`](notebooks/README.md) for setup 
details.
 
 - Java 11
 - Python **3.12** (PySpark 3.5 does NOT support Python 3.13/3.14)
-- Hudi Spark bundle (Apache 1.2.0-rc1 staging jar, or build from source)
+- Hudi Spark bundle (Apache Hudi 1.2.0 release jar, or build from source)
 - Lance Spark bundle jar
 
 ## 1. Get the Hudi bundle
 
 The scripts default `HUDI_BUNDLE_JAR` to
-`~/Downloads/hudi-spark3.5-bundle_2.12-1.2.0-rc1.jar`, so you can drop the
-Apache 1.2.0-rc1 staging jar there and skip exporting anything.
+`~/Downloads/hudi-spark3.5-bundle_2.12-1.2.0.jar`, so you can drop the
+Apache Hudi 1.2.0 release jar there and skip exporting anything.
 
-**Option A — Download the rc1 staging jar (recommended; no build required):**
+**Option A — Download the 1.2.0 release jar from Maven Central (recommended; 
no build required):**
 
 ```bash
-curl -L -o ~/Downloads/hudi-spark3.5-bundle_2.12-1.2.0-rc1.jar \
-  
https://repository.apache.org/content/repositories/orgapachehudi-1176/org/apache/hudi/hudi-spark3.5-bundle_2.12/1.2.0-rc1/hudi-spark3.5-bundle_2.12-1.2.0-rc1.jar
+curl -L -o ~/Downloads/hudi-spark3.5-bundle_2.12-1.2.0.jar \
+  
https://repo1.maven.org/maven2/org/apache/hudi/hudi-spark3.5-bundle_2.12/1.2.0/hudi-spark3.5-bundle_2.12-1.2.0.jar
 ```
 
-This is the exact jar published to Apache's Nexus staging repo for the
-1.2.0-rc1 vote — running the demo against it doubles as smoke-testing the
-release candidate. Note: the staging URL (`orgapachehudi-1176`) rolls forward
-each RC; if you're reading this after rc1 closes, find the current staging
-repo at <https://repository.apache.org/#stagingRepositories>.
-
 **Option B — Build from source:**
 
 ```bash
@@ -191,7 +185,7 @@ etc.) with similarity scores in the 0.3–0.5 range at N=100, 
tighter at N=1000.
 
 | Var | Default | Purpose |
 |---|---|---|
-| `HUDI_BUNDLE_JAR` | `~/Downloads/hudi-spark3.5-bundle_2.12-1.2.0-rc1.jar` 
(Apache 1.2.0-rc1 staging jar) | Hudi spark bundle. Override to point at a 
locally built `*-SNAPSHOT.jar` if you go the Option B route. |
+| `HUDI_BUNDLE_JAR` | `~/Downloads/hudi-spark3.5-bundle_2.12-1.2.0.jar` 
(Apache Hudi 1.2.0 release jar) | Hudi spark bundle. Override to point at a 
locally built `*-SNAPSHOT.jar` if you go the Option B route. |
 | `LANCE_BUNDLE_JAR` | `~/Downloads/lance-spark-bundle-3.5_2.12-0.4.0.jar` 
(Maven Central) | Lance spark bundle. Used only when 
`HUDI_BASE_FILE_FORMAT=lance`; Parquet runs skip it entirely. |
 | `HUDI_BASE_FILE_FORMAT` | `lance` | Set to `parquet` to write Parquet base 
files instead |
 | `HUDI_BLOB_MODE` | `out_of_line` | Blob reader demo only. Set to `inline` to 
embed PNG bytes directly in the Hudi table (no external container file) |
@@ -222,11 +216,21 @@ Look for:
 
 `hoodie.read.blob.inline.mode` controls how INLINE blobs come back:
 
-- `CONTENT` (default) — `image_bytes.data` returns the raw bytes directly.
+- `CONTENT` — `image_bytes.data` returns the raw bytes directly.
 - `DESCRIPTOR` — `image_bytes.data` is null; `image_bytes.reference.*` is
   synthesized to point at the underlying base file (`.lance` for Lance
   base files), and `read_blob(image_bytes)` materializes bytes lazily.
 
+Per-format **implicit defaults** as of Hudi 1.2.0
+([apache/hudi#18744](https://github.com/apache/hudi/pull/18744)): Parquet
+defaults to `CONTENT`, Lance defaults to `DESCRIPTOR`. The same release
+also added a strict guard in `BatchedBlobReader` that raises
+`IllegalStateException` if `read_blob()` is called on an INLINE row whose
+load is in DESCRIPTOR mode — what used to be a silent null is now a hard
+failure. Demos that target both formats therefore set the mode explicitly
+(see the SQL and DataFrame demos for the session-level pattern; the blob
+reader demo scopes it per-load).
+
 The blob reader demo exposes this via `HUDI_INLINE_READ_MODE`:
 
 ```bash
@@ -238,12 +242,14 @@ HUDI_BLOB_MODE=inline HUDI_INLINE_READ_MODE=descriptor 
python hudi_blob_reader_d
 ```
 
 **Important wiring detail (matches 
`TestLanceDataSource.testBlobInlineDescriptorMode`):**
-the `DESCRIPTOR` option is scoped to a single per-load read in
-`show_descriptors()`; `read_blob_and_save()` uses a separate default-mode
-load so `read_blob()` can actually materialize bytes. Setting
-`hoodie.read.blob.inline.mode=DESCRIPTOR` at the SparkSession level would
-make every read return `data=null`, including the read backing `read_blob()`,
-so it would also return null.
+the option is scoped per-load — `show_descriptors()` sets the user-selected
+mode on its inspection view, and `read_blob_and_save()` sets `CONTENT`
+explicitly on its own reader. The latter cannot rely on the format's
+implicit default because Lance's flipped to `DESCRIPTOR` in 1.2.0
+(see above); on that format, the new `BatchedBlobReader` guard would turn
+the silent-null behavior of older releases into a hard `IllegalStateException`.
+Setting `hoodie.read.blob.inline.mode=DESCRIPTOR` at the SparkSession level
+would similarly poison every read, including the one backing `read_blob()`.
 
 The setting is a no-op for `HUDI_BLOB_MODE=out_of_line` — those rows are
 already descriptors (no inline bytes to suppress); `read_blob()` always
@@ -462,8 +468,15 @@ Every Spark config line has a purpose:
 
 `hoodie.read.blob.inline.mode` is intentionally **not** set on the session —
 the blob reader demo scopes it per-load (see "Switching BLOB read mode"
-above) so that `read_blob()` can run against a default-mode load and
-materialize bytes.
+above): `show_descriptors()` picks up the user-chosen mode for its
+inspection view, and `read_blob_and_save()` opts into `CONTENT` explicitly
+on its own reader. Explicit is required because Lance's implicit default
+flipped to `DESCRIPTOR` in
+[apache/hudi#18744](https://github.com/apache/hudi/pull/18744) and the
+new `BatchedBlobReader` guard raises if `read_blob()` runs against a
+descriptor-mode load. The SQL and DataFrame demos take the simpler route
+of setting `CONTENT` once on the session (they don't need DESCRIPTOR
+anywhere in their flow).
 
 ### Section 2 — `load_dataset()`
 
diff --git 
a/hudi-examples/hudi-examples-spark/src/test/python/vector_blob_demo/hudi_blob_reader_demo.py
 
b/hudi-examples/hudi-examples-spark/src/test/python/vector_blob_demo/hudi_blob_reader_demo.py
index 8e14296c6b4d..731098545602 100644
--- 
a/hudi-examples/hudi-examples-spark/src/test/python/vector_blob_demo/hudi_blob_reader_demo.py
+++ 
b/hudi-examples/hudi-examples-spark/src/test/python/vector_blob_demo/hudi_blob_reader_demo.py
@@ -36,7 +36,7 @@ shows INLINE blobs + vector search); this one shows 
OUT_OF_LINE blobs +
 `read_blob()`.
 
 Env vars (shares the same conventions as the other demos):
-  HUDI_BUNDLE_JAR         (defaults to 
~/Downloads/hudi-spark3.5-bundle_2.12-1.2.0-rc1.jar)
+  HUDI_BUNDLE_JAR         (defaults to 
~/Downloads/hudi-spark3.5-bundle_2.12-1.2.0.jar)
   HUDI_BASE_FILE_FORMAT   (default 'lance'; set to 'parquet' to use Parquet)
   LANCE_BUNDLE_JAR        (defaults to 
~/Downloads/lance-spark-bundle-3.5_2.12-0.4.0.jar; only used when 
HUDI_BASE_FILE_FORMAT=lance)
   HUDI_BLOB_MODE          (default 'out_of_line'; 'inline' stores bytes in the 
Hudi table)
@@ -117,11 +117,11 @@ def ensure_dir(p: Path) -> None:
 
 
 def default_hudi_bundle_jar() -> str:
-    # Defaults to the Apache 1.2.0-rc1 staging jar in ~/Downloads/. Grab it 
with:
-    #   curl -L -o ~/Downloads/hudi-spark3.5-bundle_2.12-1.2.0-rc1.jar \
-    #     
https://repository.apache.org/content/repositories/orgapachehudi-1176/org/apache/hudi/hudi-spark3.5-bundle_2.12/1.2.0-rc1/hudi-spark3.5-bundle_2.12-1.2.0-rc1.jar
+    # Defaults to the Apache Hudi 1.2.0 release jar in ~/Downloads/. Grab it 
with:
+    #   curl -L -o ~/Downloads/hudi-spark3.5-bundle_2.12-1.2.0.jar \
+    #     
https://repo1.maven.org/maven2/org/apache/hudi/hudi-spark3.5-bundle_2.12/1.2.0/hudi-spark3.5-bundle_2.12-1.2.0.jar
     # Override via HUDI_BUNDLE_JAR=/abs/path/to/jar to point at a locally 
built bundle.
-    return str(Path.home() / "Downloads" / 
"hudi-spark3.5-bundle_2.12-1.2.0-rc1.jar")
+    return str(Path.home() / "Downloads" / 
"hudi-spark3.5-bundle_2.12-1.2.0.jar")
 
 
 def default_lance_bundle_jar() -> str:
@@ -137,9 +137,9 @@ def resolve_jars() -> str:
     if not Path(hudi_jar).is_file():
         sys.exit(
             f"ERROR: HUDI_BUNDLE_JAR does not exist at {hudi_jar}\n"
-            "Download the Apache 1.2.0-rc1 staging jar with:\n"
-            "  curl -L -o ~/Downloads/hudi-spark3.5-bundle_2.12-1.2.0-rc1.jar 
\\\n"
-            "    
https://repository.apache.org/content/repositories/orgapachehudi-1176/org/apache/hudi/hudi-spark3.5-bundle_2.12/1.2.0-rc1/hudi-spark3.5-bundle_2.12-1.2.0-rc1.jar\n";
+            "Download the Apache Hudi 1.2.0 release jar with:\n"
+            "  curl -L -o ~/Downloads/hudi-spark3.5-bundle_2.12-1.2.0.jar \\\n"
+            "    
https://repo1.maven.org/maven2/org/apache/hudi/hudi-spark3.5-bundle_2.12/1.2.0/hudi-spark3.5-bundle_2.12-1.2.0.jar\n";
             "or set HUDI_BUNDLE_JAR=/abs/path/to/locally-built.jar."
         )
 
@@ -179,13 +179,14 @@ def create_spark() -> SparkSession:
             "org.apache.spark.sql.hudi.catalog.HoodieCatalog",
         )
         .config("spark.sql.session.timeZone", "UTC")
-        # NOTE: `hoodie.read.blob.inline.mode` is intentionally NOT set on the 
SparkSession.
-        # If it were, EVERY hudi load — including the one read_blob() runs 
internally —
-        # would suppress INLINE bytes, and read_blob() would return null. 
Instead we scope
-        # the DESCRIPTOR option per-load in show_descriptors() so read_blob() 
in
-        # read_blob_and_save() runs against a default-mode (CONTENT) load and 
can
-        # materialize bytes. See 
TestLanceDataSource.testBlobInlineDescriptorMode for the
-        # canonical pattern.
+        # NOTE: `hoodie.read.blob.inline.mode` is intentionally NOT set on the 
SparkSession —
+        # session-wide DESCRIPTOR would suppress INLINE bytes on every load, 
including the
+        # one read_blob() runs internally. We scope the option per-load 
instead:
+        # show_descriptors() picks up the user-selected mode for its 
inspection view, and
+        # read_blob_and_save() explicitly sets CONTENT on its own reader (it 
cannot rely on
+        # the format's implicit default — apache/hudi#18744 flipped Lance to 
default to
+        # DESCRIPTOR in 1.2.0, while Parquet stays at CONTENT). See
+        # TestLanceDataSource.testBlobInlineDescriptorMode for the canonical 
pattern.
         .config("spark.default.parallelism", "2")
         .config("spark.sql.shuffle.partitions", "2")
     )
@@ -474,12 +475,19 @@ def read_blob_and_save(spark: SparkSession):
         f"(works regardless of inline_read_mode={CONFIG['inline_read_mode']}):"
     )
 
-    # IMPORTANT: register a fresh load WITHOUT the inline.mode option so the 
underlying
-    # read sees `data` populated (CONTENT mode). If we read from the 
DESCRIPTORS_VIEW
-    # registered in show_descriptors(), read_blob() would see data=null 
because that
-    # view was loaded in DESCRIPTOR mode — and BatchedBlobReader dispatches on 
the row's
-    # storage_type=INLINE before checking `reference`, so it would return null 
bytes.
-    
spark.read.format("hudi").load(CONFIG["table_path"]).createOrReplaceTempView(RESOLVE_VIEW)
+    # IMPORTANT: register a fresh load with 
hoodie.read.blob.inline.mode=CONTENT so the
+    # underlying read sees `data` populated. Two things are at play:
+    #   1) Parquet's default for that option is CONTENT, but Lance's default 
flipped to
+    #      DESCRIPTOR in 1.2.0 (apache/hudi#18744) — relying on the implicit 
default
+    #      worked for Parquet but silently returned null bytes on Lance.
+    #   2) As of 1.2.0, BatchedBlobReader also raises IllegalStateException 
when
+    #      read_blob() is invoked on an INLINE row under DESCRIPTOR mode, so 
the silent
+    #      null is now a hard failure. Setting CONTENT explicitly here works 
on both
+    #      formats and survives any future default changes.
+    (spark.read.format("hudi")
+        .option("hoodie.read.blob.inline.mode", "CONTENT")
+        .load(CONFIG["table_path"])
+        .createOrReplaceTempView(RESOLVE_VIEW))
 
     sql = f"""
         SELECT image_id,
diff --git 
a/hudi-examples/hudi-examples-spark/src/test/python/vector_blob_demo/hudi_dataframe_vector_blob_demo.py
 
b/hudi-examples/hudi-examples-spark/src/test/python/vector_blob_demo/hudi_dataframe_vector_blob_demo.py
index 5112d9caebb6..0d7ebea986c6 100644
--- 
a/hudi-examples/hudi-examples-spark/src/test/python/vector_blob_demo/hudi_dataframe_vector_blob_demo.py
+++ 
b/hudi-examples/hudi-examples-spark/src/test/python/vector_blob_demo/hudi_dataframe_vector_blob_demo.py
@@ -28,7 +28,7 @@ End-to-end flow:
   5) Save the query image, top-K neighbors, and a combined panel figure.
 
 Env vars:
-  HUDI_BUNDLE_JAR        (defaults to 
~/Downloads/hudi-spark3.5-bundle_2.12-1.2.0-rc1.jar)
+  HUDI_BUNDLE_JAR        (defaults to 
~/Downloads/hudi-spark3.5-bundle_2.12-1.2.0.jar)
   HUDI_BASE_FILE_FORMAT  (default 'lance'; set to 'parquet' to use Parquet 
base files)
   LANCE_BUNDLE_JAR       (defaults to 
~/Downloads/lance-spark-bundle-3.5_2.12-0.4.0.jar;
                           only used when HUDI_BASE_FILE_FORMAT=lance)
@@ -126,11 +126,11 @@ def save_png_bytes(img_bytes: bytes, path: Path) -> None:
 
 
 def default_hudi_bundle_jar() -> str:
-    # Defaults to the Apache 1.2.0-rc1 staging jar in ~/Downloads/. Grab it 
with:
-    #   curl -L -o ~/Downloads/hudi-spark3.5-bundle_2.12-1.2.0-rc1.jar \
-    #     
https://repository.apache.org/content/repositories/orgapachehudi-1176/org/apache/hudi/hudi-spark3.5-bundle_2.12/1.2.0-rc1/hudi-spark3.5-bundle_2.12-1.2.0-rc1.jar
+    # Defaults to the Apache Hudi 1.2.0 release jar in ~/Downloads/. Grab it 
with:
+    #   curl -L -o ~/Downloads/hudi-spark3.5-bundle_2.12-1.2.0.jar \
+    #     
https://repo1.maven.org/maven2/org/apache/hudi/hudi-spark3.5-bundle_2.12/1.2.0/hudi-spark3.5-bundle_2.12-1.2.0.jar
     # Override via HUDI_BUNDLE_JAR=/abs/path/to/jar to point at a locally 
built bundle.
-    return str(Path.home() / "Downloads" / 
"hudi-spark3.5-bundle_2.12-1.2.0-rc1.jar")
+    return str(Path.home() / "Downloads" / 
"hudi-spark3.5-bundle_2.12-1.2.0.jar")
 
 
 def default_lance_bundle_jar() -> str:
@@ -146,9 +146,9 @@ def resolve_jars() -> str:
     if not Path(hudi_jar).is_file():
         sys.exit(
             f"ERROR: HUDI_BUNDLE_JAR does not exist at {hudi_jar}\n"
-            "Download the Apache 1.2.0-rc1 staging jar with:\n"
-            "  curl -L -o ~/Downloads/hudi-spark3.5-bundle_2.12-1.2.0-rc1.jar 
\\\n"
-            "    
https://repository.apache.org/content/repositories/orgapachehudi-1176/org/apache/hudi/hudi-spark3.5-bundle_2.12/1.2.0-rc1/hudi-spark3.5-bundle_2.12-1.2.0-rc1.jar\n";
+            "Download the Apache Hudi 1.2.0 release jar with:\n"
+            "  curl -L -o ~/Downloads/hudi-spark3.5-bundle_2.12-1.2.0.jar \\\n"
+            "    
https://repo1.maven.org/maven2/org/apache/hudi/hudi-spark3.5-bundle_2.12/1.2.0/hudi-spark3.5-bundle_2.12-1.2.0.jar\n";
             "or set HUDI_BUNDLE_JAR=/abs/path/to/locally-built.jar."
         )
 
diff --git 
a/hudi-examples/hudi-examples-spark/src/test/python/vector_blob_demo/hudi_sql_vector_blob_demo.py
 
b/hudi-examples/hudi-examples-spark/src/test/python/vector_blob_demo/hudi_sql_vector_blob_demo.py
index c36bd0cad56d..f4ff6b9316d8 100644
--- 
a/hudi-examples/hudi-examples-spark/src/test/python/vector_blob_demo/hudi_sql_vector_blob_demo.py
+++ 
b/hudi-examples/hudi-examples-spark/src/test/python/vector_blob_demo/hudi_sql_vector_blob_demo.py
@@ -29,7 +29,7 @@ Image loading (torchvision) and embedding generation (timm) 
stay in Python —
 those cannot be SQL. The bridge between the two is a Spark temp view.
 
 Env vars (same as the DataFrame variant):
-  HUDI_BUNDLE_JAR         (defaults to 
~/Downloads/hudi-spark3.5-bundle_2.12-1.2.0-rc1.jar)
+  HUDI_BUNDLE_JAR         (defaults to 
~/Downloads/hudi-spark3.5-bundle_2.12-1.2.0.jar)
   HUDI_BASE_FILE_FORMAT   (default 'lance'; set to 'parquet' to use Parquet)
   LANCE_BUNDLE_JAR        (defaults to 
~/Downloads/lance-spark-bundle-3.5_2.12-0.4.0.jar; only used when 
HUDI_BASE_FILE_FORMAT=lance)
   HUDI_LANCE_DEMO_N       (default 1000; number of images to ingest)
@@ -112,11 +112,11 @@ def save_png_bytes(img_bytes: bytes, path: Path) -> None:
 
 
 def default_hudi_bundle_jar() -> str:
-    # Defaults to the Apache 1.2.0-rc1 staging jar in ~/Downloads/. Grab it 
with:
-    #   curl -L -o ~/Downloads/hudi-spark3.5-bundle_2.12-1.2.0-rc1.jar \
-    #     
https://repository.apache.org/content/repositories/orgapachehudi-1176/org/apache/hudi/hudi-spark3.5-bundle_2.12/1.2.0-rc1/hudi-spark3.5-bundle_2.12-1.2.0-rc1.jar
+    # Defaults to the Apache Hudi 1.2.0 release jar in ~/Downloads/. Grab it 
with:
+    #   curl -L -o ~/Downloads/hudi-spark3.5-bundle_2.12-1.2.0.jar \
+    #     
https://repo1.maven.org/maven2/org/apache/hudi/hudi-spark3.5-bundle_2.12/1.2.0/hudi-spark3.5-bundle_2.12-1.2.0.jar
     # Override via HUDI_BUNDLE_JAR=/abs/path/to/jar to point at a locally 
built bundle.
-    return str(Path.home() / "Downloads" / 
"hudi-spark3.5-bundle_2.12-1.2.0-rc1.jar")
+    return str(Path.home() / "Downloads" / 
"hudi-spark3.5-bundle_2.12-1.2.0.jar")
 
 
 def default_lance_bundle_jar() -> str:
@@ -132,9 +132,9 @@ def resolve_jars() -> str:
     if not Path(hudi_jar).is_file():
         sys.exit(
             f"ERROR: HUDI_BUNDLE_JAR does not exist at {hudi_jar}\n"
-            "Download the Apache 1.2.0-rc1 staging jar with:\n"
-            "  curl -L -o ~/Downloads/hudi-spark3.5-bundle_2.12-1.2.0-rc1.jar 
\\\n"
-            "    
https://repository.apache.org/content/repositories/orgapachehudi-1176/org/apache/hudi/hudi-spark3.5-bundle_2.12/1.2.0-rc1/hudi-spark3.5-bundle_2.12-1.2.0-rc1.jar\n";
+            "Download the Apache Hudi 1.2.0 release jar with:\n"
+            "  curl -L -o ~/Downloads/hudi-spark3.5-bundle_2.12-1.2.0.jar \\\n"
+            "    
https://repo1.maven.org/maven2/org/apache/hudi/hudi-spark3.5-bundle_2.12/1.2.0/hudi-spark3.5-bundle_2.12-1.2.0.jar\n";
             "or set HUDI_BUNDLE_JAR=/abs/path/to/locally-built.jar."
         )
 
diff --git 
a/hudi-examples/hudi-examples-spark/src/test/python/vector_blob_demo/notebooks/00_main_demo.ipynb
 
b/hudi-examples/hudi-examples-spark/src/test/python/vector_blob_demo/notebooks/00_main_demo.ipynb
index b9e68320fb67..c2dc058d9d5b 100644
--- 
a/hudi-examples/hudi-examples-spark/src/test/python/vector_blob_demo/notebooks/00_main_demo.ipynb
+++ 
b/hudi-examples/hudi-examples-spark/src/test/python/vector_blob_demo/notebooks/00_main_demo.ipynb
@@ -27,7 +27,7 @@
    "source": [
     "## 1. Setup\n",
     "\n",
-    "Boots Spark with the Hudi rc1 + Lance bundles from `~/Downloads/`, wipes 
`/tmp/` paths from prior runs."
+    "Boots Spark with the Hudi 1.2.0 + Lance bundles from `~/Downloads/`, 
wipes `/tmp/` paths from prior runs."
    ]
   },
   {
@@ -105,7 +105,7 @@
     "# === Resolve jars (defaults to ~/Downloads/) ===\n",
     "def _default_jar(name): return str(Path.home() / \"Downloads\" / name)\n",
     "\n",
-    "HUDI_JAR  = os.getenv(\"HUDI_BUNDLE_JAR\",  
_default_jar(\"hudi-spark3.5-bundle_2.12-1.2.0-rc1.jar\"))\n",
+    "HUDI_JAR  = os.getenv(\"HUDI_BUNDLE_JAR\",  
_default_jar(\"hudi-spark3.5-bundle_2.12-1.2.0.jar\"))\n",
     "LANCE_JAR = os.getenv(\"LANCE_BUNDLE_JAR\", 
_default_jar(\"lance-spark-bundle-3.5_2.12-0.4.0.jar\"))\n",
     "for jar in (HUDI_JAR, LANCE_JAR):\n",
     "    if not Path(jar).is_file():\n",
@@ -120,6 +120,11 @@
     "    .config(\"spark.sql.extensions\", 
\"org.apache.spark.sql.hudi.HoodieSparkSessionExtension\")\n",
     "    .config(\"spark.sql.catalog.spark_catalog\", 
\"org.apache.spark.sql.hudi.catalog.HoodieCatalog\")\n",
     "    .config(\"spark.sql.session.timeZone\", \"UTC\")\n",
+    "    # Lance flipped its default for hoodie.read.blob.inline.mode to 
DESCRIPTOR\n",
+    "    # in apache/hudi#18744 (1.2.0); Parquet still defaults to CONTENT.\n",
+    "    # Pinning CONTENT session-wide keeps read_blob() and 
image_bytes.data\n",
+    "    # working regardless of which base file format the DDL ends up 
using.\n",
+    "    .config(\"hoodie.read.blob.inline.mode\", \"CONTENT\")\n",
     "    .config(\"spark.default.parallelism\", \"2\")\n",
     "    .config(\"spark.sql.shuffle.partitions\", \"2\")\n",
     "    .config(\"spark.ui.showConsoleProgress\", \"false\")\n",
diff --git 
a/hudi-examples/hudi-examples-spark/src/test/python/vector_blob_demo/notebooks/01_blob_reader.ipynb
 
b/hudi-examples/hudi-examples-spark/src/test/python/vector_blob_demo/notebooks/01_blob_reader.ipynb
index ea550c35d78f..952911426a5e 100644
--- 
a/hudi-examples/hudi-examples-spark/src/test/python/vector_blob_demo/notebooks/01_blob_reader.ipynb
+++ 
b/hudi-examples/hudi-examples-spark/src/test/python/vector_blob_demo/notebooks/01_blob_reader.ipynb
@@ -226,11 +226,11 @@
     "from pathlib import Path\n",
     "\n",
     "def default_hudi_bundle_jar() -> str:\n",
-    "    # Defaults to the Apache 1.2.0-rc1 staging jar in ~/Downloads/. Grab 
it with:\n",
-    "    #   curl -L -o ~/Downloads/hudi-spark3.5-bundle_2.12-1.2.0-rc1.jar 
\\\n",
-    "    #     
https://repository.apache.org/content/repositories/orgapachehudi-1176/org/apache/hudi/hudi-spark3.5-bundle_2.12/1.2.0-rc1/hudi-spark3.5-bundle_2.12-1.2.0-rc1.jar\n";,
+    "    # Defaults to the Apache Hudi 1.2.0 release jar in ~/Downloads/. Grab 
it with:\n",
+    "    #   curl -L -o ~/Downloads/hudi-spark3.5-bundle_2.12-1.2.0.jar \\\n",
+    "    #     
https://repo1.maven.org/maven2/org/apache/hudi/hudi-spark3.5-bundle_2.12/1.2.0/hudi-spark3.5-bundle_2.12-1.2.0.jar\n";,
     "    # Override via HUDI_BUNDLE_JAR env var to point at a locally built 
bundle.\n",
-    "    return str(Path.home() / \"Downloads\" / 
\"hudi-spark3.5-bundle_2.12-1.2.0-rc1.jar\")\n",
+    "    return str(Path.home() / \"Downloads\" / 
\"hudi-spark3.5-bundle_2.12-1.2.0.jar\")\n",
     "\n",
     "def default_lance_bundle_jar() -> str:\n",
     "    # Defaults to the Maven Central Lance 0.4.0 jar in ~/Downloads/. Grab 
it with:\n",
@@ -243,7 +243,7 @@
     "    if not Path(hudi_jar).is_file():\n",
     "        sys.exit(\n",
     "            f\"ERROR: HUDI_BUNDLE_JAR does not exist at 
{hudi_jar}\\n\"\n",
-    "            \"Download the Apache 1.2.0-rc1 staging jar to ~/Downloads/ 
\"\n",
+    "            \"Download the Apache Hudi 1.2.0 release jar to ~/Downloads/ 
\"\n",
     "            \"or set HUDI_BUNDLE_JAR=/abs/path/to/locally-built.jar \"\n",
     "            \"before launching jupyter.\"\n",
     "        )\n",
@@ -295,11 +295,15 @@
    ],
    "source": [
     "# Note: `hoodie.read.blob.inline.mode` is intentionally NOT set on the\n",
-    "# SparkSession. If it were, every Hudi load — including the one\n",
-    "# `read_blob()` runs internally — would suppress INLINE bytes. Instead\n",
-    "# the descriptor inspection step in cell 12 sets the option per-load,\n",
-    "# and the `read_blob()` step in cell 13 runs against a separate\n",
-    "# default-mode (CONTENT) load. See\n",
+    "# SparkSession — session-wide DESCRIPTOR would suppress INLINE bytes 
on\n",
+    "# every load, including the one `read_blob()` runs internally. The\n",
+    "# option is scoped per-load instead: the descriptor inspection step\n",
+    "# (cell 13) picks up the user-chosen mode, and the `read_blob()` step\n",
+    "# (cell 14) explicitly sets CONTENT on its own reader. The explicit\n",
+    "# CONTENT is required because Lance's implicit default flipped to\n",
+    "# DESCRIPTOR in apache/hudi#18744 (1.2.0), while Parquet still\n",
+    "# defaults to CONTENT — and BatchedBlobReader now raises if read_blob\n",
+    "# runs against a descriptor-mode load. See\n",
     "# `TestLanceDataSource.testBlobInlineDescriptorMode` for the canonical\n",
     "# pattern.\n",
     "jars = resolve_jars(CONFIG[\"base_file_format\"])\n",
@@ -677,12 +681,16 @@
    "source": [
     "## 14. `read_blob(image_bytes)` — materialize bytes on demand\n",
     "\n",
-    "Note the **separate, default-mode** Hudi load: `read_blob()` 
dispatches\n",
-    "on the row's `storage_type` and only consults `reference` for\n",
-    "`OUT_OF_LINE`. For INLINE rows it reads the `data` field directly — so\n",
-    "if the DESCRIPTORS_VIEW above (which suppresses bytes in DESCRIPTOR\n",
-    "mode) were reused here, `read_blob()` would return null. Two views\n",
-    "keeps both paths working."
+    "Note the **separate, explicit-CONTENT** Hudi load: `read_blob()`\n",
+    "dispatches on the row's `storage_type` and only consults `reference`\n",
+    "for `OUT_OF_LINE`. For INLINE rows it reads the `data` field directly\n",
+    "— so if the DESCRIPTORS_VIEW above (which suppresses bytes in\n",
+    "DESCRIPTOR mode) were reused here, `read_blob()` would either return\n",
+    "null (Parquet) or raise `IllegalStateException` (Lance, as of\n",
+    "[apache/hudi#18744](https://github.com/apache/hudi/pull/18744)).\n",
+    "Setting `hoodie.read.blob.inline.mode=CONTENT` explicitly here also\n",
+    "covers Lance's new DESCRIPTOR default — we don't rely on the\n",
+    "format's implicit default."
    ]
   },
   {
@@ -710,7 +718,10 @@
    ],
    "source": [
     "RESOLVE_VIEW = \"blob_resolve_view\"\n",
-    
"spark.read.format(\"hudi\").load(CONFIG[\"table_path\"]).createOrReplaceTempView(RESOLVE_VIEW)\n",
+    "(spark.read.format(\"hudi\")\n",
+    "    .option(\"hoodie.read.blob.inline.mode\", \"CONTENT\")\n",
+    "    .load(CONFIG[\"table_path\"])\n",
+    "    .createOrReplaceTempView(RESOLVE_VIEW))\n",
     "\n",
     "spark.sql(f\"\"\"\n",
     "    SELECT image_id,\n",
diff --git 
a/hudi-examples/hudi-examples-spark/src/test/python/vector_blob_demo/notebooks/02_sql_vector_search.ipynb
 
b/hudi-examples/hudi-examples-spark/src/test/python/vector_blob_demo/notebooks/02_sql_vector_search.ipynb
index c41ae58c217f..8c451c75ad64 100644
--- 
a/hudi-examples/hudi-examples-spark/src/test/python/vector_blob_demo/notebooks/02_sql_vector_search.ipynb
+++ 
b/hudi-examples/hudi-examples-spark/src/test/python/vector_blob_demo/notebooks/02_sql_vector_search.ipynb
@@ -257,8 +257,8 @@
     "bundle and (if `BASE_FILE_FORMAT == \"lance\"`) the Lance Spark 
bundle.\n",
     "\n",
     "**Defaults:**\n",
-    "- `HUDI_BUNDLE_JAR` → 
`~/Downloads/hudi-spark3.5-bundle_2.12-1.2.0-rc1.jar`\n",
-    "  (Apache 1.2.0-rc1 staging jar)\n",
+    "- `HUDI_BUNDLE_JAR` → 
`~/Downloads/hudi-spark3.5-bundle_2.12-1.2.0.jar`\n",
+    "  (Apache Hudi 1.2.0 release jar)\n",
     "- `LANCE_BUNDLE_JAR` → 
`~/Downloads/lance-spark-bundle-3.5_2.12-0.4.0.jar`\n",
     "  (Maven Central)\n",
     "\n",
@@ -279,11 +279,11 @@
     "from pathlib import Path\n",
     "\n",
     "def default_hudi_bundle_jar() -> str:\n",
-    "    # Defaults to the Apache 1.2.0-rc1 staging jar in ~/Downloads/. Grab 
it with:\n",
-    "    #   curl -L -o ~/Downloads/hudi-spark3.5-bundle_2.12-1.2.0-rc1.jar 
\\\n",
-    "    #     
https://repository.apache.org/content/repositories/orgapachehudi-1176/org/apache/hudi/hudi-spark3.5-bundle_2.12/1.2.0-rc1/hudi-spark3.5-bundle_2.12-1.2.0-rc1.jar\n";,
+    "    # Defaults to the Apache Hudi 1.2.0 release jar in ~/Downloads/. Grab 
it with:\n",
+    "    #   curl -L -o ~/Downloads/hudi-spark3.5-bundle_2.12-1.2.0.jar \\\n",
+    "    #     
https://repo1.maven.org/maven2/org/apache/hudi/hudi-spark3.5-bundle_2.12/1.2.0/hudi-spark3.5-bundle_2.12-1.2.0.jar\n";,
     "    # Override via HUDI_BUNDLE_JAR env var to point at a locally built 
bundle.\n",
-    "    return str(Path.home() / \"Downloads\" / 
\"hudi-spark3.5-bundle_2.12-1.2.0-rc1.jar\")\n",
+    "    return str(Path.home() / \"Downloads\" / 
\"hudi-spark3.5-bundle_2.12-1.2.0.jar\")\n",
     "\n",
     "def default_lance_bundle_jar() -> str:\n",
     "    # Defaults to the Maven Central Lance 0.4.0 jar in ~/Downloads/. Grab 
it with:\n",
@@ -296,7 +296,7 @@
     "    if not Path(hudi_jar).is_file():\n",
     "        sys.exit(\n",
     "            f\"ERROR: HUDI_BUNDLE_JAR does not exist at 
{hudi_jar}\\n\"\n",
-    "            \"Download the Apache 1.2.0-rc1 staging jar to ~/Downloads/ 
\"\n",
+    "            \"Download the Apache Hudi 1.2.0 release jar to ~/Downloads/ 
\"\n",
     "            \"or set HUDI_BUNDLE_JAR=/abs/path/to/locally-built.jar \"\n",
     "            \"before launching jupyter.\"\n",
     "        )\n",
diff --git 
a/hudi-examples/hudi-examples-spark/src/test/python/vector_blob_demo/notebooks/03_dataframe_vector_search.ipynb
 
b/hudi-examples/hudi-examples-spark/src/test/python/vector_blob_demo/notebooks/03_dataframe_vector_search.ipynb
index d4f94953a204..8d2e4046e2b0 100644
--- 
a/hudi-examples/hudi-examples-spark/src/test/python/vector_blob_demo/notebooks/03_dataframe_vector_search.ipynb
+++ 
b/hudi-examples/hudi-examples-spark/src/test/python/vector_blob_demo/notebooks/03_dataframe_vector_search.ipynb
@@ -197,11 +197,11 @@
     "from pathlib import Path\n",
     "\n",
     "def default_hudi_bundle_jar() -> str:\n",
-    "    # Defaults to the Apache 1.2.0-rc1 staging jar in ~/Downloads/. Grab 
it with:\n",
-    "    #   curl -L -o ~/Downloads/hudi-spark3.5-bundle_2.12-1.2.0-rc1.jar 
\\\n",
-    "    #     
https://repository.apache.org/content/repositories/orgapachehudi-1176/org/apache/hudi/hudi-spark3.5-bundle_2.12/1.2.0-rc1/hudi-spark3.5-bundle_2.12-1.2.0-rc1.jar\n";,
+    "    # Defaults to the Apache Hudi 1.2.0 release jar in ~/Downloads/. Grab 
it with:\n",
+    "    #   curl -L -o ~/Downloads/hudi-spark3.5-bundle_2.12-1.2.0.jar \\\n",
+    "    #     
https://repo1.maven.org/maven2/org/apache/hudi/hudi-spark3.5-bundle_2.12/1.2.0/hudi-spark3.5-bundle_2.12-1.2.0.jar\n";,
     "    # Override via HUDI_BUNDLE_JAR env var to point at a locally built 
bundle.\n",
-    "    return str(Path.home() / \"Downloads\" / 
\"hudi-spark3.5-bundle_2.12-1.2.0-rc1.jar\")\n",
+    "    return str(Path.home() / \"Downloads\" / 
\"hudi-spark3.5-bundle_2.12-1.2.0.jar\")\n",
     "\n",
     "def default_lance_bundle_jar() -> str:\n",
     "    # Defaults to the Maven Central Lance 0.4.0 jar in ~/Downloads/. Grab 
it with:\n",
@@ -214,7 +214,7 @@
     "    if not Path(hudi_jar).is_file():\n",
     "        sys.exit(\n",
     "            f\"ERROR: HUDI_BUNDLE_JAR does not exist at 
{hudi_jar}\\n\"\n",
-    "            \"Download the Apache 1.2.0-rc1 staging jar to ~/Downloads/ 
\"\n",
+    "            \"Download the Apache Hudi 1.2.0 release jar to ~/Downloads/ 
\"\n",
     "            \"or set HUDI_BUNDLE_JAR=/abs/path/to/locally-built.jar \"\n",
     "            \"before launching jupyter.\"\n",
     "        )\n",
diff --git 
a/hudi-examples/hudi-examples-spark/src/test/python/vector_blob_demo/notebooks/README.md
 
b/hudi-examples/hudi-examples-spark/src/test/python/vector_blob_demo/notebooks/README.md
index e6d66089eb73..e72fe7058b1b 100644
--- 
a/hudi-examples/hudi-examples-spark/src/test/python/vector_blob_demo/notebooks/README.md
+++ 
b/hudi-examples/hudi-examples-spark/src/test/python/vector_blob_demo/notebooks/README.md
@@ -37,7 +37,7 @@ pip install -r requirements.txt    # adds jupyter + ipykernel 
for this folder
 ```
 
 The notebooks default `HUDI_BUNDLE_JAR` to
-`~/Downloads/hudi-spark3.5-bundle_2.12-1.2.0-rc1.jar` and `LANCE_BUNDLE_JAR`
+`~/Downloads/hudi-spark3.5-bundle_2.12-1.2.0.jar` and `LANCE_BUNDLE_JAR`
 to `~/Downloads/lance-spark-bundle-3.5_2.12-0.4.0.jar`, matching the `.py`
 scripts. If you placed both jars in `~/Downloads/` per the parent
 [`README.md`](../README.md) §1–2, you don't need to export anything. To

(hudi) branch master updated: docs(examples): pin blob.inline.mode=CONTENT after Lance default flip (#18823)

Reply via email to