(datafusion-python) branch main updated: Consolidate DataFrame Docs: Merge HTML Rendering Section as Subpage (#1161)

timsaucer Tue, 24 Jun 2025 16:33:16 -0700

This is an automated email from the ASF dual-hosted git repository.

timsaucer pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion-python.git



The following commit(s) were added to refs/heads/main by this push:
     new 0d3c37f9 Consolidate DataFrame Docs: Merge HTML Rendering Section as 
Subpage (#1161)
0d3c37f9 is described below

commit 0d3c37f9379fd12406022c7edc3fee056866a694
Author: kosiew <kos...@gmail.com>
AuthorDate: Wed Jun 25 07:33:03 2025 +0800

    Consolidate DataFrame Docs: Merge HTML Rendering Section as Subpage (#1161)
    
    * docs: unify dataframe documentation (#2)
    
    * docs: update links in basics and rendering documentation
    
    * docs: add API reference section to main index
    
    * docs: add license information to rendering documentation
    
    * Move API Reference under User Guide > Dataframe
    
    * Merge data from dataframe api reference page into main dataframe page
    
    ---------
    
    Co-authored-by: Tim Saucer <timsau...@gmail.com>
---
 docs/source/api/dataframe.rst                      | 387 ---------------------
 docs/source/api/index.rst                          |  27 --
 docs/source/index.rst                              |   4 +-
 docs/source/user-guide/basics.rst                  |   2 +-
 docs/source/user-guide/dataframe/index.rst         | 209 +++++++++++
 .../{dataframe.rst => dataframe/rendering.rst}     | 103 +++---
 6 files changed, 265 insertions(+), 467 deletions(-)

diff --git a/docs/source/api/dataframe.rst b/docs/source/api/dataframe.rst
deleted file mode 100644
index a9e9e47c..00000000
--- a/docs/source/api/dataframe.rst
+++ /dev/null
@@ -1,387 +0,0 @@
-.. Licensed to the Apache Software Foundation (ASF) under one
-.. or more contributor license agreements.  See the NOTICE file
-.. distributed with this work for additional information
-.. regarding copyright ownership.  The ASF licenses this file
-.. to you under the Apache License, Version 2.0 (the
-.. "License"); you may not use this file except in compliance
-.. with the License.  You may obtain a copy of the License at
-
-..   http://www.apache.org/licenses/LICENSE-2.0
-
-.. Unless required by applicable law or agreed to in writing,
-.. software distributed under the License is distributed on an
-.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-.. KIND, either express or implied.  See the License for the
-.. specific language governing permissions and limitations
-.. under the License.
-
-=================
-DataFrame API
-=================
-
-Overview
---------
-
-The ``DataFrame`` class is the core abstraction in DataFusion that represents 
tabular data and operations
-on that data. DataFrames provide a flexible API for transforming data through 
various operations such as
-filtering, projection, aggregation, joining, and more.
-
-A DataFrame represents a logical plan that is lazily evaluated. The actual 
execution occurs only when 
-terminal operations like ``collect()``, ``show()``, or ``to_pandas()`` are 
called.
-
-Creating DataFrames
--------------------
-
-DataFrames can be created in several ways:
-
-* From SQL queries via a ``SessionContext``:
-
-  .. code-block:: python
-
-      from datafusion import SessionContext
-      
-      ctx = SessionContext()
-      df = ctx.sql("SELECT * FROM your_table")
-
-* From registered tables:
-
-  .. code-block:: python
-
-      df = ctx.table("your_table")
-
-* From various data sources:
-
-  .. code-block:: python
-
-      # From CSV files (see :ref:`io_csv` for detailed options)
-      df = ctx.read_csv("path/to/data.csv")
-      
-      # From Parquet files (see :ref:`io_parquet` for detailed options)
-      df = ctx.read_parquet("path/to/data.parquet")
-      
-      # From JSON files (see :ref:`io_json` for detailed options)
-      df = ctx.read_json("path/to/data.json")
-      
-      # From Avro files (see :ref:`io_avro` for detailed options)
-      df = ctx.read_avro("path/to/data.avro")
-      
-      # From Pandas DataFrame
-      import pandas as pd
-      pandas_df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
-      df = ctx.from_pandas(pandas_df)
-      
-      # From Arrow data
-      import pyarrow as pa
-      batch = pa.RecordBatch.from_arrays(
-          [pa.array([1, 2, 3]), pa.array([4, 5, 6])],
-          names=["a", "b"]
-      )
-      df = ctx.from_arrow(batch)
-
-  For detailed information about reading from different data sources, see the 
:doc:`I/O Guide <../user-guide/io/index>`.
-  For custom data sources, see :ref:`io_custom_table_provider`.
-
-Common DataFrame Operations
----------------------------
-
-DataFusion's DataFrame API offers a wide range of operations:
-
-.. code-block:: python
-
-    from datafusion import column, literal
-    
-    # Select specific columns
-    df = df.select("col1", "col2")
-    
-    # Select with expressions
-    df = df.select(column("a") + column("b"), column("a") - column("b"))
-    
-    # Filter rows
-    df = df.filter(column("age") > literal(25))
-    
-    # Add computed columns
-    df = df.with_column("full_name", column("first_name") + literal(" ") + 
column("last_name"))
-    
-    # Multiple column additions
-    df = df.with_columns(
-        (column("a") + column("b")).alias("sum"),
-        (column("a") * column("b")).alias("product")
-    )
-    
-    # Sort data
-    df = df.sort(column("age").sort(ascending=False))
-    
-    # Join DataFrames
-    df = df1.join(df2, on="user_id", how="inner")
-    
-    # Aggregate data
-    from datafusion import functions as f
-    df = df.aggregate(
-        [],  # Group by columns (empty for global aggregation)
-        [f.sum(column("amount")).alias("total_amount")]
-    )
-    
-    # Limit rows
-    df = df.limit(100)
-    
-    # Drop columns
-    df = df.drop("temporary_column")
-
-Terminal Operations
--------------------
-
-To materialize the results of your DataFrame operations:
-
-.. code-block:: python
-
-    # Collect all data as PyArrow RecordBatches
-    result_batches = df.collect()
-    
-    # Convert to various formats
-    pandas_df = df.to_pandas()        # Pandas DataFrame
-    polars_df = df.to_polars()        # Polars DataFrame
-    arrow_table = df.to_arrow_table() # PyArrow Table
-    py_dict = df.to_pydict()          # Python dictionary
-    py_list = df.to_pylist()          # Python list of dictionaries
-    
-    # Display results
-    df.show()                         # Print tabular format to console
-    
-    # Count rows
-    count = df.count()
-
-HTML Rendering in Jupyter
--------------------------
-
-When working in Jupyter notebooks or other environments that support rich HTML 
display, 
-DataFusion DataFrames automatically render as nicely formatted HTML tables. 
This functionality
-is provided by the ``_repr_html_`` method, which is automatically called by 
Jupyter.
-
-Basic HTML Rendering
-~~~~~~~~~~~~~~~~~~~~
-
-In a Jupyter environment, simply displaying a DataFrame object will trigger 
HTML rendering:
-
-.. code-block:: python
-
-    # Will display as HTML table in Jupyter
-    df
-
-    # Explicit display also uses HTML rendering
-    display(df)
-
-HTML Rendering Customization
-----------------------------
-
-DataFusion provides extensive customization options for HTML table rendering 
through the
-``datafusion.html_formatter`` module.
-
-Configuring the HTML Formatter
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-You can customize how DataFrames are rendered by configuring the formatter:
-
-.. code-block:: python
-
-    from datafusion.html_formatter import configure_formatter
-    
-    configure_formatter(
-        max_cell_length=30,              # Maximum length of cell content 
before truncation
-        max_width=800,                   # Maximum width of table in pixels
-        max_height=400,                  # Maximum height of table in pixels
-        max_memory_bytes=2 * 1024 * 1024,# Maximum memory used for rendering 
(2MB)
-        min_rows_display=10,             # Minimum rows to display
-        repr_rows=20,                    # Number of rows to display in 
representation
-        enable_cell_expansion=True,      # Allow cells to be expandable on 
click
-        custom_css=None,                 # Custom CSS to apply
-        show_truncation_message=True,    # Show message when data is truncated
-        style_provider=None,             # Custom style provider class
-        use_shared_styles=True           # Share styles across tables to 
reduce duplication
-    )
-
-Custom Style Providers
-~~~~~~~~~~~~~~~~~~~~~~
-
-For advanced styling needs, you can create a custom style provider class:
-
-.. code-block:: python
-
-    from datafusion.html_formatter import configure_formatter
-    
-    class CustomStyleProvider:
-        def get_cell_style(self) -> str:
-            return "background-color: #f5f5f5; color: #333; padding: 8px; 
border: 1px solid #ddd;"
-    
-        def get_header_style(self) -> str:
-            return "background-color: #4285f4; color: white; font-weight: 
bold; padding: 10px;"
-    
-    # Apply custom styling
-    configure_formatter(style_provider=CustomStyleProvider())
-
-Custom Type Formatters
-~~~~~~~~~~~~~~~~~~~~~~
-
-You can register custom formatters for specific data types:
-
-.. code-block:: python
-
-    from datafusion.html_formatter import get_formatter
-    
-    formatter = get_formatter()
-    
-    # Format integers with color based on value
-    def format_int(value):
-        return f'<span style="color: {"red" if value > 100 else 
"blue"}">{value}</span>'
-    
-    formatter.register_formatter(int, format_int)
-    
-    # Format date values
-    def format_date(value):
-        return f'<span class="date-value">{value.isoformat()}</span>'
-    
-    formatter.register_formatter(datetime.date, format_date)
-
-Custom Cell Builders
-~~~~~~~~~~~~~~~~~~~~
-
-For complete control over cell rendering:
-
-.. code-block:: python
-
-    formatter = get_formatter()
-    
-    def custom_cell_builder(value, row, col, table_id):
-        try:
-            num_value = float(value)
-            if num_value > 0:  # Positive values get green
-                return f'<td style="background-color: #d9f0d3">{value}</td>'
-            if num_value < 0:  # Negative values get red
-                return f'<td style="background-color: #f0d3d3">{value}</td>'
-        except (ValueError, TypeError):
-            pass
-        
-        # Default styling for non-numeric or zero values
-        return f'<td style="border: 1px solid #ddd">{value}</td>'
-    
-    formatter.set_custom_cell_builder(custom_cell_builder)
-
-Custom Header Builders
-~~~~~~~~~~~~~~~~~~~~~~
-
-Similarly, you can customize the rendering of table headers:
-
-.. code-block:: python
-
-    def custom_header_builder(field):
-        tooltip = f"Type: {field.type}"
-        return f'<th style="background-color: #333; color: white" 
title="{tooltip}">{field.name}</th>'
-    
-    formatter.set_custom_header_builder(custom_header_builder)
-
-Managing Formatter State
------------------------~
-
-The HTML formatter maintains global state that can be managed:
-
-.. code-block:: python
-
-    from datafusion.html_formatter import reset_formatter, 
reset_styles_loaded_state, get_formatter
-    
-    # Reset the formatter to default settings
-    reset_formatter()
-    
-    # Reset only the styles loaded state (useful when styles were loaded but 
need reloading)
-    reset_styles_loaded_state()
-    
-    # Get the current formatter instance to make changes
-    formatter = get_formatter()
-
-Advanced Example: Dashboard-Style Formatting
-------------------------------------------~~
-
-This example shows how to create a dashboard-like styling for your DataFrames:
-
-.. code-block:: python
-
-    from datafusion.html_formatter import configure_formatter, get_formatter
-    
-    # Define custom CSS
-    custom_css = """
-    .datafusion-table {
-        font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
-        border-collapse: collapse;
-        width: 100%;
-        box-shadow: 0 2px 3px rgba(0,0,0,0.1);
-    }
-    .datafusion-table th {
-        position: sticky;
-        top: 0;
-        z-index: 10;
-    }
-    .datafusion-table tr:hover td {
-        background-color: #f1f7fa !important;
-    }
-    .datafusion-table .numeric-positive {
-        color: #0a7c00;
-    }
-    .datafusion-table .numeric-negative {
-        color: #d13438;
-    }
-    """
-    
-    class DashboardStyleProvider:
-        def get_cell_style(self) -> str:
-            return "padding: 8px 12px; border-bottom: 1px solid #e0e0e0;"
-        
-        def get_header_style(self) -> str:
-            return ("background-color: #0078d4; color: white; font-weight: 
600; "
-                    "padding: 12px; text-align: left; border-bottom: 2px solid 
#005a9e;")
-    
-    # Apply configuration
-    configure_formatter(
-        max_height=500,
-        enable_cell_expansion=True,
-        custom_css=custom_css,
-        style_provider=DashboardStyleProvider(),
-        max_cell_length=50
-    )
-    
-    # Add custom formatters for numbers
-    formatter = get_formatter()
-    
-    def format_number(value):
-        try:
-            num = float(value)
-            cls = "numeric-positive" if num > 0 else "numeric-negative" if num 
< 0 else ""
-            return f'<span class="{cls}">{value:,}</span>' if cls else 
f'{value:,}'
-        except (ValueError, TypeError):
-            return str(value)
-    
-    formatter.register_formatter(int, format_number)
-    formatter.register_formatter(float, format_number)
-
-Best Practices
---------------
-
-1. **Memory Management**: For large datasets, use ``max_memory_bytes`` to 
limit memory usage.
-
-2. **Responsive Design**: Set reasonable ``max_width`` and ``max_height`` 
values to ensure tables display well on different screens.
-
-3. **Style Optimization**: Use ``use_shared_styles=True`` to avoid duplicate 
style definitions when displaying multiple tables.
-
-4. **Reset When Needed**: Call ``reset_formatter()`` when you want to start 
fresh with default settings.
-
-5. **Cell Expansion**: Use ``enable_cell_expansion=True`` when cells might 
contain longer content that users may want to see in full.
-
-Additional Resources
---------------------
-
-* :doc:`../user-guide/dataframe` - Complete guide to using DataFrames
-* :doc:`../user-guide/io/index` - I/O Guide for reading data from various 
sources
-* :doc:`../user-guide/data-sources` - Comprehensive data sources guide
-* :ref:`io_csv` - CSV file reading
-* :ref:`io_parquet` - Parquet file reading  
-* :ref:`io_json` - JSON file reading
-* :ref:`io_avro` - Avro file reading
-* :ref:`io_custom_table_provider` - Custom table providers
-* `API Reference <https://arrow.apache.org/datafusion-python/api/index.html>`_ 
- Full API reference
diff --git a/docs/source/api/index.rst b/docs/source/api/index.rst
deleted file mode 100644
index 7f58227c..00000000
--- a/docs/source/api/index.rst
+++ /dev/null
@@ -1,27 +0,0 @@
-.. Licensed to the Apache Software Foundation (ASF) under one
-.. or more contributor license agreements.  See the NOTICE file
-.. distributed with this work for additional information
-.. regarding copyright ownership.  The ASF licenses this file
-.. to you under the Apache License, Version 2.0 (the
-.. "License"); you may not use this file except in compliance
-.. with the License.  You may obtain a copy of the License at
-
-..   http://www.apache.org/licenses/LICENSE-2.0
-
-.. Unless required by applicable law or agreed to in writing,
-.. software distributed under the License is distributed on an
-.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-.. KIND, either express or implied.  See the License for the
-.. specific language governing permissions and limitations
-.. under the License.
-
-=============
-API Reference
-=============
-
-This section provides detailed API documentation for the DataFusion Python 
library.
-
-.. toctree::
-   :maxdepth: 2
-   
-   dataframe
diff --git a/docs/source/index.rst b/docs/source/index.rst
index ff1e4728..adec60f4 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -72,7 +72,7 @@ Example
    user-guide/introduction
    user-guide/basics
    user-guide/data-sources
-   user-guide/dataframe
+   user-guide/dataframe/index
    user-guide/common-operations/index
    user-guide/io/index
    user-guide/configuration
@@ -93,5 +93,3 @@ Example
    :hidden:
    :maxdepth: 1
    :caption: API
-   
-   api/index
diff --git a/docs/source/user-guide/basics.rst 
b/docs/source/user-guide/basics.rst
index 2975d9a6..7c682046 100644
--- a/docs/source/user-guide/basics.rst
+++ b/docs/source/user-guide/basics.rst
@@ -73,7 +73,7 @@ DataFrames are typically created by calling a method on 
:py:class:`~datafusion.c
 calling the transformation methods, such as 
:py:func:`~datafusion.dataframe.DataFrame.filter`, 
:py:func:`~datafusion.dataframe.DataFrame.select`, 
:py:func:`~datafusion.dataframe.DataFrame.aggregate`,
 and :py:func:`~datafusion.dataframe.DataFrame.limit` to build up a query 
definition.
 
-For more details on working with DataFrames, including visualization options 
and conversion to other formats, see :doc:`dataframe`.
+For more details on working with DataFrames, including visualization options 
and conversion to other formats, see :doc:`dataframe/index`.
 
 Expressions
 -----------
diff --git a/docs/source/user-guide/dataframe/index.rst 
b/docs/source/user-guide/dataframe/index.rst
new file mode 100644
index 00000000..f69485af
--- /dev/null
+++ b/docs/source/user-guide/dataframe/index.rst
@@ -0,0 +1,209 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+DataFrames
+==========
+
+Overview
+--------
+
+The ``DataFrame`` class is the core abstraction in DataFusion that represents 
tabular data and operations
+on that data. DataFrames provide a flexible API for transforming data through 
various operations such as
+filtering, projection, aggregation, joining, and more.
+
+A DataFrame represents a logical plan that is lazily evaluated. The actual 
execution occurs only when 
+terminal operations like ``collect()``, ``show()``, or ``to_pandas()`` are 
called.
+
+Creating DataFrames
+-------------------
+
+DataFrames can be created in several ways:
+
+* From SQL queries via a ``SessionContext``:
+
+  .. code-block:: python
+
+      from datafusion import SessionContext
+      
+      ctx = SessionContext()
+      df = ctx.sql("SELECT * FROM your_table")
+
+* From registered tables:
+
+  .. code-block:: python
+
+      df = ctx.table("your_table")
+
+* From various data sources:
+
+  .. code-block:: python
+
+      # From CSV files (see :ref:`io_csv` for detailed options)
+      df = ctx.read_csv("path/to/data.csv")
+      
+      # From Parquet files (see :ref:`io_parquet` for detailed options)
+      df = ctx.read_parquet("path/to/data.parquet")
+      
+      # From JSON files (see :ref:`io_json` for detailed options)
+      df = ctx.read_json("path/to/data.json")
+      
+      # From Avro files (see :ref:`io_avro` for detailed options)
+      df = ctx.read_avro("path/to/data.avro")
+      
+      # From Pandas DataFrame
+      import pandas as pd
+      pandas_df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
+      df = ctx.from_pandas(pandas_df)
+      
+      # From Arrow data
+      import pyarrow as pa
+      batch = pa.RecordBatch.from_arrays(
+          [pa.array([1, 2, 3]), pa.array([4, 5, 6])],
+          names=["a", "b"]
+      )
+      df = ctx.from_arrow(batch)
+
+For detailed information about reading from different data sources, see the 
:doc:`I/O Guide <../io/index>`.
+For custom data sources, see :ref:`io_custom_table_provider`.
+
+Common DataFrame Operations
+---------------------------
+
+DataFusion's DataFrame API offers a wide range of operations:
+
+.. code-block:: python
+
+    from datafusion import column, literal
+    
+    # Select specific columns
+    df = df.select("col1", "col2")
+    
+    # Select with expressions
+    df = df.select(column("a") + column("b"), column("a") - column("b"))
+    
+    # Filter rows
+    df = df.filter(column("age") > literal(25))
+    
+    # Add computed columns
+    df = df.with_column("full_name", column("first_name") + literal(" ") + 
column("last_name"))
+    
+    # Multiple column additions
+    df = df.with_columns(
+        (column("a") + column("b")).alias("sum"),
+        (column("a") * column("b")).alias("product")
+    )
+    
+    # Sort data
+    df = df.sort(column("age").sort(ascending=False))
+    
+    # Join DataFrames
+    df = df1.join(df2, on="user_id", how="inner")
+    
+    # Aggregate data
+    from datafusion import functions as f
+    df = df.aggregate(
+        [],  # Group by columns (empty for global aggregation)
+        [f.sum(column("amount")).alias("total_amount")]
+    )
+    
+    # Limit rows
+    df = df.limit(100)
+    
+    # Drop columns
+    df = df.drop("temporary_column")
+
+Terminal Operations
+-------------------
+
+To materialize the results of your DataFrame operations:
+
+.. code-block:: python
+
+    # Collect all data as PyArrow RecordBatches
+    result_batches = df.collect()
+    
+    # Convert to various formats
+    pandas_df = df.to_pandas()        # Pandas DataFrame
+    polars_df = df.to_polars()        # Polars DataFrame
+    arrow_table = df.to_arrow_table() # PyArrow Table
+    py_dict = df.to_pydict()          # Python dictionary
+    py_list = df.to_pylist()          # Python list of dictionaries
+    
+    # Display results
+    df.show()                         # Print tabular format to console
+    
+    # Count rows
+    count = df.count()
+
+HTML Rendering
+--------------
+
+When working in Jupyter notebooks or other environments that support HTML 
rendering, DataFrames will
+automatically display as formatted HTML tables. For detailed information about 
customizing HTML 
+rendering, formatting options, and advanced styling, see :doc:`rendering`.
+
+Core Classes
+------------
+
+**DataFrame**
+    The main DataFrame class for building and executing queries.
+
+    See: :py:class:`datafusion.DataFrame`
+
+**SessionContext**
+    The primary entry point for creating DataFrames from various data sources.
+
+    Key methods for DataFrame creation:
+
+    * :py:meth:`~datafusion.SessionContext.read_csv` - Read CSV files
+    * :py:meth:`~datafusion.SessionContext.read_parquet` - Read Parquet files
+    * :py:meth:`~datafusion.SessionContext.read_json` - Read JSON files
+    * :py:meth:`~datafusion.SessionContext.read_avro` - Read Avro files
+    * :py:meth:`~datafusion.SessionContext.table` - Access registered tables
+    * :py:meth:`~datafusion.SessionContext.sql` - Execute SQL queries
+    * :py:meth:`~datafusion.SessionContext.from_pandas` - Create from Pandas 
DataFrame
+    * :py:meth:`~datafusion.SessionContext.from_arrow` - Create from Arrow data
+
+    See: :py:class:`datafusion.SessionContext`
+
+Expression Classes
+------------------
+
+**Expr**
+    Represents expressions that can be used in DataFrame operations.
+
+    See: :py:class:`datafusion.Expr`
+
+**Functions for creating expressions:**
+
+* :py:func:`datafusion.column` - Reference a column by name
+* :py:func:`datafusion.literal` - Create a literal value expression
+
+Built-in Functions
+------------------
+
+DataFusion provides many built-in functions for data manipulation:
+
+* :py:mod:`datafusion.functions` - Mathematical, string, date/time, and 
aggregation functions
+
+For a complete list of available functions, see the 
:py:mod:`datafusion.functions` module documentation.
+
+
+.. toctree::
+   :maxdepth: 1
+
+   rendering
diff --git a/docs/source/user-guide/dataframe.rst 
b/docs/source/user-guide/dataframe/rendering.rst
similarity index 72%
rename from docs/source/user-guide/dataframe.rst
rename to docs/source/user-guide/dataframe/rendering.rst
index 23c65b5f..4c37c747 100644
--- a/docs/source/user-guide/dataframe.rst
+++ b/docs/source/user-guide/dataframe/rendering.rst
@@ -15,59 +15,37 @@
 .. specific language governing permissions and limitations
 .. under the License.
 
-DataFrames
-==========
+HTML Rendering in Jupyter
+=========================
 
-Overview
---------
+When working in Jupyter notebooks or other environments that support rich HTML 
display, 
+DataFusion DataFrames automatically render as nicely formatted HTML tables. 
This functionality
+is provided by the ``_repr_html_`` method, which is automatically called by 
Jupyter to provide
+a richer visualization than plain text output.
 
-DataFusion's DataFrame API provides a powerful interface for building and 
executing queries against data sources. 
-It offers a familiar API similar to pandas and other DataFrame libraries, but 
with the performance benefits of Rust 
-and Arrow.
+Basic HTML Rendering
+--------------------
 
-A DataFrame represents a logical plan that can be composed through operations 
like filtering, projection, and aggregation.
-The actual execution happens when terminal operations like ``collect()`` or 
``show()`` are called.
-
-Basic Usage
------------
+In a Jupyter environment, simply displaying a DataFrame object will trigger 
HTML rendering:
 
 .. code-block:: python
 
-    import datafusion
-    from datafusion import col, lit
+    # Will display as HTML table in Jupyter
+    df
 
-    # Create a context and register a data source
-    ctx = datafusion.SessionContext()
-    ctx.register_csv("my_table", "path/to/data.csv")
-    
-    # Create and manipulate a DataFrame
-    df = ctx.sql("SELECT * FROM my_table")
-    
-    # Or use the DataFrame API directly
-    df = (ctx.table("my_table")
-          .filter(col("age") > lit(25))
-          .select([col("name"), col("age")]))
-    
-    # Execute and collect results
-    result = df.collect()
-    
-    # Display the first few rows
-    df.show()
+    # Explicit display also uses HTML rendering
+    display(df)
 
-HTML Rendering
---------------
-
-When working in Jupyter notebooks or other environments that support HTML 
rendering, DataFrames will
-automatically display as formatted HTML tables, making it easier to visualize 
your data.
+Customizing HTML Rendering
+---------------------------
 
-The ``_repr_html_`` method is called automatically by Jupyter to render a 
DataFrame. This method 
-controls how DataFrames appear in notebook environments, providing a richer 
visualization than
-plain text output.
+DataFusion provides extensive customization options for HTML table rendering 
through the
+``datafusion.html_formatter`` module.
 
-Customizing HTML Rendering
---------------------------
+Configuring the HTML Formatter
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-You can customize how DataFrames are rendered in HTML by configuring the 
formatter:
+You can customize how DataFrames are rendered by configuring the formatter:
 
 .. code-block:: python
 
@@ -91,7 +69,7 @@ You can customize how DataFrames are rendered in HTML by 
configuring the formatt
 The formatter settings affect all DataFrames displayed after configuration.
 
 Custom Style Providers
-----------------------
+-----------------------
 
 For advanced styling needs, you can create a custom style provider:
 
@@ -118,7 +96,8 @@ For advanced styling needs, you can create a custom style 
provider:
     configure_formatter(style_provider=MyStyleProvider())
 
 Performance Optimization with Shared Styles
--------------------------------------------
+--------------------------------------------
+
 The ``use_shared_styles`` parameter (enabled by default) optimizes performance 
when displaying 
 multiple DataFrames in notebook environments:
 
@@ -138,7 +117,7 @@ When ``use_shared_styles=True``:
 - Applies consistent styling across all DataFrames
 
 Creating a Custom Formatter
----------------------------
+----------------------------
 
 For complete control over rendering, you can implement a custom formatter:
 
@@ -184,7 +163,7 @@ Get the current formatter settings:
     print(formatter.theme)
 
 Contextual Formatting
----------------------
+----------------------
 
 You can also use a context manager to temporarily change formatting settings:
 
@@ -207,12 +186,38 @@ Memory and Display Controls
 
 You can control how much data is displayed and how much memory is used for 
rendering:
 
- .. code-block:: python
- 
+.. code-block:: python
+
     configure_formatter(
         max_memory_bytes=4 * 1024 * 1024,  # 4MB maximum memory for display
         min_rows_display=50,               # Always show at least 50 rows
         repr_rows=20                       # Show 20 rows in __repr__ output
     )
 
-These parameters help balance comprehensive data display against performance 
considerations.
\ No newline at end of file
+These parameters help balance comprehensive data display against performance 
considerations.
+
+Best Practices
+--------------
+
+1. **Global Configuration**: Use ``configure_formatter()`` at the beginning of 
your notebook to set up consistent formatting for all DataFrames.
+
+2. **Memory Management**: Set appropriate ``max_memory_bytes`` limits to 
prevent performance issues with large datasets.
+
+3. **Shared Styles**: Keep ``use_shared_styles=True`` (default) for better 
performance in notebooks with multiple DataFrames.
+
+4. **Reset When Needed**: Call ``reset_formatter()`` when you want to start 
fresh with default settings.
+
+5. **Cell Expansion**: Use ``enable_cell_expansion=True`` when cells might 
contain longer content that users may want to see in full.
+
+Additional Resources
+--------------------
+
+* :doc:`../dataframe/index` - Complete guide to using DataFrames
+* :doc:`../io/index` - I/O Guide for reading data from various sources
+* :doc:`../data-sources` - Comprehensive data sources guide
+* :ref:`io_csv` - CSV file reading
+* :ref:`io_parquet` - Parquet file reading  
+* :ref:`io_json` - JSON file reading
+* :ref:`io_avro` - Avro file reading
+* :ref:`io_custom_table_provider` - Custom table providers
+* `API Reference <https://arrow.apache.org/datafusion-python/api/index.html>`_ 
- Full API reference


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org

(datafusion-python) branch main updated: Consolidate DataFrame Docs: Merge HTML Rendering Section as Subpage (#1161)

Reply via email to