This is an automated email from the ASF dual-hosted git repository.
github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datafusion-ballista.git
The following commit(s) were added to refs/heads/asf-site by this push:
new f09d0f78f Publish built docs triggered by
caea92969f34a0c0899a6eb13713b9f7e377f80d
f09d0f78f is described below
commit f09d0f78f2d93b7f58ea823e646515f0019b6474
Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
AuthorDate: Thu Jan 22 23:11:20 2026 +0000
Publish built docs triggered by caea92969f34a0c0899a6eb13713b9f7e377f80d
---
_sources/user-guide/python.md.txt | 99 ++++++++++++++++++++++++++++++
searchindex.js | 2 +-
user-guide/python.html | 123 ++++++++++++++++++++++++++++++++++++++
3 files changed, 223 insertions(+), 1 deletion(-)
diff --git a/_sources/user-guide/python.md.txt
b/_sources/user-guide/python.md.txt
index e7d139683..f065cd66e 100644
--- a/_sources/user-guide/python.md.txt
+++ b/_sources/user-guide/python.md.txt
@@ -141,6 +141,105 @@ assert result.column(0) == pyarrow.array([5, 7, 9])
assert result.column(1) == pyarrow.array([-3, -3, -3])
```
+## Jupyter Notebook Support
+
+Ballista works well in Jupyter notebooks. DataFrames automatically render as
formatted HTML tables when displayed
+in a notebook cell.
+
+### Basic Usage
+
+```python
+from ballista import BallistaSessionContext
+
+# Connect to a Ballista cluster
+ctx = BallistaSessionContext("df://localhost:50050")
+
+# Register a table
+ctx.register_parquet("trips", "/path/to/nyctaxi.parquet")
+
+# Run a query - the result renders as an HTML table
+ctx.sql("SELECT * FROM trips LIMIT 10")
+```
+
+When a DataFrame is the last expression in a cell, Jupyter automatically calls
its `_repr_html_()` method,
+which renders a styled table with:
+
+- Formatted column headers
+- Expandable cells for long text content
+- Scrollable display for wide tables
+
+### Converting Results
+
+DataFrames can be converted to various formats for further analysis:
+
+```python
+df = ctx.sql("SELECT * FROM trips WHERE fare_amount > 50")
+
+# Convert to Pandas DataFrame
+pandas_df = df.to_pandas()
+
+# Convert to PyArrow Table
+arrow_table = df.to_arrow_table()
+
+# Convert to Polars DataFrame
+polars_df = df.to_polars()
+
+# Collect as PyArrow RecordBatches
+batches = df.collect()
+```
+
+### Example Notebook Workflow
+
+A typical notebook workflow might look like:
+
+```python
+# Cell 1: Setup
+from ballista import BallistaSessionContext
+from datafusion import col, lit
+
+ctx = BallistaSessionContext("df://localhost:50050")
+ctx.register_parquet("orders", "/data/orders.parquet")
+ctx.register_parquet("customers", "/data/customers.parquet")
+
+# Cell 2: Explore the data
+ctx.sql("SELECT * FROM orders LIMIT 5")
+
+# Cell 3: Run analysis
+df = ctx.sql("""
+ SELECT
+ c.name,
+ COUNT(*) as order_count,
+ SUM(o.amount) as total_spent
+ FROM orders o
+ JOIN customers c ON o.customer_id = c.id
+ GROUP BY c.name
+ ORDER BY total_spent DESC
+ LIMIT 10
+""")
+df
+
+# Cell 4: Convert to Pandas for visualization
+import matplotlib.pyplot as plt
+
+pandas_df = df.to_pandas()
+pandas_df.plot(kind='bar', x='name', y='total_spent')
+plt.show()
+```
+
+### Running a Local Cluster in a Notebook
+
+For development and testing, you can start a local cluster directly from a
notebook:
+
+```python
+from ballista import BallistaSessionContext, setup_test_cluster
+
+# Start a local scheduler and executor
+host, port = setup_test_cluster()
+
+# Connect to it
+ctx = BallistaSessionContext(f"df://{host}:{port}")
+```
+
## User Defined Functions
The underlying DataFusion query engine supports Python UDFs but this
functionality has not yet been implemented in
diff --git a/searchindex.js b/searchindex.js
index b8142b363..cfa910a16 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"Apache DataFusion Ballista": [[4, null]],
"Arrow-native": [[1, "arrow-native"]], "Autoscaling Executors": [[11,
"autoscaling-executors"]], "Ballista Architecture": [[1, null]], "Ballista Code
Organization": [[2, null]], "Ballista Command-line Interface": [[5, null]],
"Ballista Configuration Settings": [[6, "ballista-configuration-settings"]],
"Ballista Development": [[3, null]], "Ballista Python Bindings": [[18, null]],
"Ballista Quickstart": [[12, null]], [...]
\ No newline at end of file
+Search.setIndex({"alltitles": {"Apache DataFusion Ballista": [[4, null]],
"Arrow-native": [[1, "arrow-native"]], "Autoscaling Executors": [[11,
"autoscaling-executors"]], "Ballista Architecture": [[1, null]], "Ballista Code
Organization": [[2, null]], "Ballista Command-line Interface": [[5, null]],
"Ballista Configuration Settings": [[6, "ballista-configuration-settings"]],
"Ballista Development": [[3, null]], "Ballista Python Bindings": [[18, null]],
"Ballista Quickstart": [[12, null]], [...]
\ No newline at end of file
diff --git a/user-guide/python.html b/user-guide/python.html
index c1edb58c0..1cd7b6ee3 100644
--- a/user-guide/python.html
+++ b/user-guide/python.html
@@ -302,6 +302,33 @@
DataFrame
</a>
</li>
+ <li class="toc-h2 nav-item toc-entry">
+ <a class="reference internal nav-link" href="#jupyter-notebook-support">
+ Jupyter Notebook Support
+ </a>
+ <ul class="nav section-nav flex-column">
+ <li class="toc-h3 nav-item toc-entry">
+ <a class="reference internal nav-link" href="#basic-usage">
+ Basic Usage
+ </a>
+ </li>
+ <li class="toc-h3 nav-item toc-entry">
+ <a class="reference internal nav-link" href="#converting-results">
+ Converting Results
+ </a>
+ </li>
+ <li class="toc-h3 nav-item toc-entry">
+ <a class="reference internal nav-link" href="#example-notebook-workflow">
+ Example Notebook Workflow
+ </a>
+ </li>
+ <li class="toc-h3 nav-item toc-entry">
+ <a class="reference internal nav-link"
href="#running-a-local-cluster-in-a-notebook">
+ Running a Local Cluster in a Notebook
+ </a>
+ </li>
+ </ul>
+ </li>
<li class="toc-h2 nav-item toc-entry">
<a class="reference internal nav-link" href="#user-defined-functions">
User Defined Functions
@@ -467,6 +494,102 @@ COUNT(UInt8(1)): int64]
</pre></div>
</div>
</section>
+<section id="jupyter-notebook-support">
+<h2>Jupyter Notebook Support<a class="headerlink"
href="#jupyter-notebook-support" title="Link to this heading">¶</a></h2>
+<p>Ballista works well in Jupyter notebooks. DataFrames automatically render
as formatted HTML tables when displayed
+in a notebook cell.</p>
+<section id="basic-usage">
+<h3>Basic Usage<a class="headerlink" href="#basic-usage" title="Link to this
heading">¶</a></h3>
+<div class="highlight-python notranslate"><div
class="highlight"><pre><span></span><span class="kn">from</span><span
class="w"> </span><span class="nn">ballista</span><span class="w"> </span><span
class="kn">import</span> <span class="n">BallistaSessionContext</span>
+
+<span class="c1"># Connect to a Ballista cluster</span>
+<span class="n">ctx</span> <span class="o">=</span> <span
class="n">BallistaSessionContext</span><span class="p">(</span><span
class="s2">"df://localhost:50050"</span><span class="p">)</span>
+
+<span class="c1"># Register a table</span>
+<span class="n">ctx</span><span class="o">.</span><span
class="n">register_parquet</span><span class="p">(</span><span
class="s2">"trips"</span><span class="p">,</span> <span
class="s2">"/path/to/nyctaxi.parquet"</span><span class="p">)</span>
+
+<span class="c1"># Run a query - the result renders as an HTML table</span>
+<span class="n">ctx</span><span class="o">.</span><span
class="n">sql</span><span class="p">(</span><span class="s2">"SELECT *
FROM trips LIMIT 10"</span><span class="p">)</span>
+</pre></div>
+</div>
+<p>When a DataFrame is the last expression in a cell, Jupyter automatically
calls its <code class="docutils literal notranslate"><span
class="pre">_repr_html_()</span></code> method,
+which renders a styled table with:</p>
+<ul class="simple">
+<li><p>Formatted column headers</p></li>
+<li><p>Expandable cells for long text content</p></li>
+<li><p>Scrollable display for wide tables</p></li>
+</ul>
+</section>
+<section id="converting-results">
+<h3>Converting Results<a class="headerlink" href="#converting-results"
title="Link to this heading">¶</a></h3>
+<p>DataFrames can be converted to various formats for further analysis:</p>
+<div class="highlight-python notranslate"><div
class="highlight"><pre><span></span><span class="n">df</span> <span
class="o">=</span> <span class="n">ctx</span><span class="o">.</span><span
class="n">sql</span><span class="p">(</span><span class="s2">"SELECT *
FROM trips WHERE fare_amount > 50"</span><span class="p">)</span>
+
+<span class="c1"># Convert to Pandas DataFrame</span>
+<span class="n">pandas_df</span> <span class="o">=</span> <span
class="n">df</span><span class="o">.</span><span
class="n">to_pandas</span><span class="p">()</span>
+
+<span class="c1"># Convert to PyArrow Table</span>
+<span class="n">arrow_table</span> <span class="o">=</span> <span
class="n">df</span><span class="o">.</span><span
class="n">to_arrow_table</span><span class="p">()</span>
+
+<span class="c1"># Convert to Polars DataFrame</span>
+<span class="n">polars_df</span> <span class="o">=</span> <span
class="n">df</span><span class="o">.</span><span
class="n">to_polars</span><span class="p">()</span>
+
+<span class="c1"># Collect as PyArrow RecordBatches</span>
+<span class="n">batches</span> <span class="o">=</span> <span
class="n">df</span><span class="o">.</span><span class="n">collect</span><span
class="p">()</span>
+</pre></div>
+</div>
+</section>
+<section id="example-notebook-workflow">
+<h3>Example Notebook Workflow<a class="headerlink"
href="#example-notebook-workflow" title="Link to this heading">¶</a></h3>
+<p>A typical notebook workflow might look like:</p>
+<div class="highlight-python notranslate"><div
class="highlight"><pre><span></span><span class="c1"># Cell 1: Setup</span>
+<span class="kn">from</span><span class="w"> </span><span
class="nn">ballista</span><span class="w"> </span><span
class="kn">import</span> <span class="n">BallistaSessionContext</span>
+<span class="kn">from</span><span class="w"> </span><span
class="nn">datafusion</span><span class="w"> </span><span
class="kn">import</span> <span class="n">col</span><span class="p">,</span>
<span class="n">lit</span>
+
+<span class="n">ctx</span> <span class="o">=</span> <span
class="n">BallistaSessionContext</span><span class="p">(</span><span
class="s2">"df://localhost:50050"</span><span class="p">)</span>
+<span class="n">ctx</span><span class="o">.</span><span
class="n">register_parquet</span><span class="p">(</span><span
class="s2">"orders"</span><span class="p">,</span> <span
class="s2">"/data/orders.parquet"</span><span class="p">)</span>
+<span class="n">ctx</span><span class="o">.</span><span
class="n">register_parquet</span><span class="p">(</span><span
class="s2">"customers"</span><span class="p">,</span> <span
class="s2">"/data/customers.parquet"</span><span class="p">)</span>
+
+<span class="c1"># Cell 2: Explore the data</span>
+<span class="n">ctx</span><span class="o">.</span><span
class="n">sql</span><span class="p">(</span><span class="s2">"SELECT *
FROM orders LIMIT 5"</span><span class="p">)</span>
+
+<span class="c1"># Cell 3: Run analysis</span>
+<span class="n">df</span> <span class="o">=</span> <span
class="n">ctx</span><span class="o">.</span><span class="n">sql</span><span
class="p">(</span><span class="s2">"""</span>
+<span class="s2"> SELECT</span>
+<span class="s2"> c.name,</span>
+<span class="s2"> COUNT(*) as order_count,</span>
+<span class="s2"> SUM(o.amount) as total_spent</span>
+<span class="s2"> FROM orders o</span>
+<span class="s2"> JOIN customers c ON o.customer_id = c.id</span>
+<span class="s2"> GROUP BY c.name</span>
+<span class="s2"> ORDER BY total_spent DESC</span>
+<span class="s2"> LIMIT 10</span>
+<span class="s2">"""</span><span class="p">)</span>
+<span class="n">df</span>
+
+<span class="c1"># Cell 4: Convert to Pandas for visualization</span>
+<span class="kn">import</span><span class="w"> </span><span
class="nn">matplotlib.pyplot</span><span class="w"> </span><span
class="k">as</span><span class="w"> </span><span class="nn">plt</span>
+
+<span class="n">pandas_df</span> <span class="o">=</span> <span
class="n">df</span><span class="o">.</span><span
class="n">to_pandas</span><span class="p">()</span>
+<span class="n">pandas_df</span><span class="o">.</span><span
class="n">plot</span><span class="p">(</span><span class="n">kind</span><span
class="o">=</span><span class="s1">'bar'</span><span class="p">,</span>
<span class="n">x</span><span class="o">=</span><span
class="s1">'name'</span><span class="p">,</span> <span
class="n">y</span><span class="o">=</span><span
class="s1">'total_spent'</span><span class="p">)</span>
+<span class="n">plt</span><span class="o">.</span><span
class="n">show</span><span class="p">()</span>
+</pre></div>
+</div>
+</section>
+<section id="running-a-local-cluster-in-a-notebook">
+<h3>Running a Local Cluster in a Notebook<a class="headerlink"
href="#running-a-local-cluster-in-a-notebook" title="Link to this
heading">¶</a></h3>
+<p>For development and testing, you can start a local cluster directly from a
notebook:</p>
+<div class="highlight-python notranslate"><div
class="highlight"><pre><span></span><span class="kn">from</span><span
class="w"> </span><span class="nn">ballista</span><span class="w"> </span><span
class="kn">import</span> <span class="n">BallistaSessionContext</span><span
class="p">,</span> <span class="n">setup_test_cluster</span>
+
+<span class="c1"># Start a local scheduler and executor</span>
+<span class="n">host</span><span class="p">,</span> <span
class="n">port</span> <span class="o">=</span> <span
class="n">setup_test_cluster</span><span class="p">()</span>
+
+<span class="c1"># Connect to it</span>
+<span class="n">ctx</span> <span class="o">=</span> <span
class="n">BallistaSessionContext</span><span class="p">(</span><span
class="sa">f</span><span class="s2">"df://</span><span
class="si">{</span><span class="n">host</span><span class="si">}</span><span
class="s2">:</span><span class="si">{</span><span class="n">port</span><span
class="si">}</span><span class="s2">"</span><span class="p">)</span>
+</pre></div>
+</div>
+</section>
+</section>
<section id="user-defined-functions">
<h2>User Defined Functions<a class="headerlink" href="#user-defined-functions"
title="Link to this heading">¶</a></h2>
<p>The underlying DataFusion query engine supports Python UDFs but this
functionality has not yet been implemented in
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]