This is an automated email from the ASF dual-hosted git repository. jiayu pushed a commit to branch fix/2407-raster-jupyter-display in repository https://gitbox.apache.org/repos/asf/sedona.git
commit 4d63f08c03dc1a683788b231a3fe38345d71f691 Author: Jia Yu <[email protected]> AuthorDate: Sun Feb 8 21:33:45 2026 -0800 [GH-2407] Optimize raster image display in Jupyter notebooks SedonaUtils.display_image() was slow for raster DataFrames because it routed through SedonaMapUtils.__convert_to_gdf_or_pdf__(), which performs Arrow conversion, geopandas import attempts, and DataFrame-to-HTML-table wrapping — all unnecessary when the input is already HTML <img> strings from RS_AsImage(). - Add fast path: collect rows directly and render HTML <img> strings without intermediate Arrow/Pandas/to_html() conversion - Keep fallback to original path for non-image DataFrames - Add docstring to display_image() --- docs/api/sql/Raster-visualizer.md | 12 ++++++++-- python/sedona/spark/raster_utils/SedonaUtils.py | 32 +++++++++++++++++++++++++ 2 files changed, 42 insertions(+), 2 deletions(-) diff --git a/docs/api/sql/Raster-visualizer.md b/docs/api/sql/Raster-visualizer.md index ce58eb986c..a7ec6f0774 100644 --- a/docs/api/sql/Raster-visualizer.md +++ b/docs/api/sql/Raster-visualizer.md @@ -72,9 +72,9 @@ Output: ``` !!!Tip - RS_AsImage can be paired with SedonaUtils.display_image(df) wrapper inside a Jupyter notebook to directly print the raster as an image in the output, where the 'df' parameter is the dataframe containing the HTML data provided by RS_AsImage + RS_AsImage can be paired with SedonaUtils.display_image(df) wrapper inside a Jupyter notebook to directly print the raster as an image in the output. You can pass either a raw raster DataFrame or a DataFrame with pre-applied RS_AsImage HTML. -Example: +Example — direct raster display (recommended): ```python from sedona.spark import SedonaUtils @@ -86,6 +86,14 @@ df = ( .load(DATA_DIR + "raster.tiff") .selectExpr("RS_FromGeoTiff(content) as raster") ) + +# Pass the raw raster DataFrame directly — RS_AsImage is applied automatically +SedonaUtils.display_image(df) +``` + +Example — with explicit RS_AsImage: + +```python htmlDF = df.selectExpr("RS_AsImage(raster, 500) as raster_image") SedonaUtils.display_image(htmlDF) ``` diff --git a/python/sedona/spark/raster_utils/SedonaUtils.py b/python/sedona/spark/raster_utils/SedonaUtils.py index f292bd490a..c68e03f641 100644 --- a/python/sedona/spark/raster_utils/SedonaUtils.py +++ b/python/sedona/spark/raster_utils/SedonaUtils.py @@ -21,7 +21,39 @@ from sedona.spark.maps.SedonaMapUtils import SedonaMapUtils class SedonaUtils: @classmethod def display_image(cls, df): + """Display raster images in a Jupyter notebook. + + Accepts DataFrames with either: + - A raster column (GridCoverage2D) — auto-applies RS_AsImage + - An HTML image column from RS_AsImage() — renders directly + + Falls back to the SedonaMapUtils HTML table path for other DataFrames. + """ from IPython.display import HTML, display + schema = df.schema + + # Detect raster UDT columns and auto-apply RS_AsImage. + # Without this, passing a raw raster DataFrame to the fallback path + # causes __convert_to_gdf_or_pdf__ to Arrow-serialize the full raster + # grid, which hangs for large rasters (e.g., 1400x800). + raster_cols = [ + f.name + for f in schema.fields + if hasattr(f.dataType, "typeName") and f.dataType.typeName() == "rastertype" + ] + if raster_cols: + # Replace each raster column with its RS_AsImage() HTML representation, + # preserving all other columns in the DataFrame. + select_exprs = [ + ( + f"RS_AsImage(`{f.name}`) as `{f.name}`" + if f.name in raster_cols + else f"`{f.name}`" + ) + for f in schema.fields + ] + df = df.selectExpr(*select_exprs) + pdf = SedonaMapUtils.__convert_to_gdf_or_pdf__(df, rename=False) display(HTML(pdf.to_html(escape=False)))
