(datafusion-ballista) branch asf-site updated: Publish built docs triggered by caea92969f34a0c0899a6eb13713b9f7e377f80d

github-bot Thu, 22 Jan 2026 15:12:17 -0800

This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datafusion-ballista.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new f09d0f78f Publish built docs triggered by 
caea92969f34a0c0899a6eb13713b9f7e377f80d
f09d0f78f is described below

commit f09d0f78f2d93b7f58ea823e646515f0019b6474
Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
AuthorDate: Thu Jan 22 23:11:20 2026 +0000

    Publish built docs triggered by caea92969f34a0c0899a6eb13713b9f7e377f80d
---
 _sources/user-guide/python.md.txt |  99 ++++++++++++++++++++++++++++++
 searchindex.js                    |   2 +-
 user-guide/python.html            | 123 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 223 insertions(+), 1 deletion(-)

diff --git a/_sources/user-guide/python.md.txt 
b/_sources/user-guide/python.md.txt
index e7d139683..f065cd66e 100644
--- a/_sources/user-guide/python.md.txt
+++ b/_sources/user-guide/python.md.txt
@@ -141,6 +141,105 @@ assert result.column(0) == pyarrow.array([5, 7, 9])
 assert result.column(1) == pyarrow.array([-3, -3, -3])
 ```
 
+## Jupyter Notebook Support
+
+Ballista works well in Jupyter notebooks. DataFrames automatically render as 
formatted HTML tables when displayed
+in a notebook cell.
+
+### Basic Usage
+
+```python
+from ballista import BallistaSessionContext
+
+# Connect to a Ballista cluster
+ctx = BallistaSessionContext("df://localhost:50050")
+
+# Register a table
+ctx.register_parquet("trips", "/path/to/nyctaxi.parquet")
+
+# Run a query - the result renders as an HTML table
+ctx.sql("SELECT * FROM trips LIMIT 10")
+```
+
+When a DataFrame is the last expression in a cell, Jupyter automatically calls 
its `_repr_html_()` method,
+which renders a styled table with:
+
+- Formatted column headers
+- Expandable cells for long text content
+- Scrollable display for wide tables
+
+### Converting Results
+
+DataFrames can be converted to various formats for further analysis:
+
+```python
+df = ctx.sql("SELECT * FROM trips WHERE fare_amount > 50")
+
+# Convert to Pandas DataFrame
+pandas_df = df.to_pandas()
+
+# Convert to PyArrow Table
+arrow_table = df.to_arrow_table()
+
+# Convert to Polars DataFrame
+polars_df = df.to_polars()
+
+# Collect as PyArrow RecordBatches
+batches = df.collect()
+```
+
+### Example Notebook Workflow
+
+A typical notebook workflow might look like:
+
+```python
+# Cell 1: Setup
+from ballista import BallistaSessionContext
+from datafusion import col, lit
+
+ctx = BallistaSessionContext("df://localhost:50050")
+ctx.register_parquet("orders", "/data/orders.parquet")
+ctx.register_parquet("customers", "/data/customers.parquet")
+
+# Cell 2: Explore the data
+ctx.sql("SELECT * FROM orders LIMIT 5")
+
+# Cell 3: Run analysis
+df = ctx.sql("""
+    SELECT
+        c.name,
+        COUNT(*) as order_count,
+        SUM(o.amount) as total_spent
+    FROM orders o
+    JOIN customers c ON o.customer_id = c.id
+    GROUP BY c.name
+    ORDER BY total_spent DESC
+    LIMIT 10
+""")
+df
+
+# Cell 4: Convert to Pandas for visualization
+import matplotlib.pyplot as plt
+
+pandas_df = df.to_pandas()
+pandas_df.plot(kind='bar', x='name', y='total_spent')
+plt.show()
+```
+
+### Running a Local Cluster in a Notebook
+
+For development and testing, you can start a local cluster directly from a 
notebook:
+
+```python
+from ballista import BallistaSessionContext, setup_test_cluster
+
+# Start a local scheduler and executor
+host, port = setup_test_cluster()
+
+# Connect to it
+ctx = BallistaSessionContext(f"df://{host}:{port}")
+```
+
 ## User Defined Functions
 
 The underlying DataFusion query engine supports Python UDFs but this 
functionality has not yet been implemented in
diff --git a/searchindex.js b/searchindex.js
index b8142b363..cfa910a16 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"Apache DataFusion Ballista": [[4, null]], 
"Arrow-native": [[1, "arrow-native"]], "Autoscaling Executors": [[11, 
"autoscaling-executors"]], "Ballista Architecture": [[1, null]], "Ballista Code 
Organization": [[2, null]], "Ballista Command-line Interface": [[5, null]], 
"Ballista Configuration Settings": [[6, "ballista-configuration-settings"]], 
"Ballista Development": [[3, null]], "Ballista Python Bindings": [[18, null]], 
"Ballista Quickstart": [[12, null]], [...]
\ No newline at end of file
+Search.setIndex({"alltitles": {"Apache DataFusion Ballista": [[4, null]], 
"Arrow-native": [[1, "arrow-native"]], "Autoscaling Executors": [[11, 
"autoscaling-executors"]], "Ballista Architecture": [[1, null]], "Ballista Code 
Organization": [[2, null]], "Ballista Command-line Interface": [[5, null]], 
"Ballista Configuration Settings": [[6, "ballista-configuration-settings"]], 
"Ballista Development": [[3, null]], "Ballista Python Bindings": [[18, null]], 
"Ballista Quickstart": [[12, null]], [...]
\ No newline at end of file
diff --git a/user-guide/python.html b/user-guide/python.html
index c1edb58c0..1cd7b6ee3 100644
--- a/user-guide/python.html
+++ b/user-guide/python.html
@@ -302,6 +302,33 @@
    DataFrame
   </a>
  </li>
+ <li class="toc-h2 nav-item toc-entry">
+  <a class="reference internal nav-link" href="#jupyter-notebook-support">
+   Jupyter Notebook Support
+  </a>
+  <ul class="nav section-nav flex-column">
+   <li class="toc-h3 nav-item toc-entry">
+    <a class="reference internal nav-link" href="#basic-usage">
+     Basic Usage
+    </a>
+   </li>
+   <li class="toc-h3 nav-item toc-entry">
+    <a class="reference internal nav-link" href="#converting-results">
+     Converting Results
+    </a>
+   </li>
+   <li class="toc-h3 nav-item toc-entry">
+    <a class="reference internal nav-link" href="#example-notebook-workflow">
+     Example Notebook Workflow
+    </a>
+   </li>
+   <li class="toc-h3 nav-item toc-entry">
+    <a class="reference internal nav-link" 
href="#running-a-local-cluster-in-a-notebook">
+     Running a Local Cluster in a Notebook
+    </a>
+   </li>
+  </ul>
+ </li>
  <li class="toc-h2 nav-item toc-entry">
   <a class="reference internal nav-link" href="#user-defined-functions">
    User Defined Functions
@@ -467,6 +494,102 @@ COUNT(UInt8(1)): int64]
 </pre></div>
 </div>
 </section>
+<section id="jupyter-notebook-support">
+<h2>Jupyter Notebook Support<a class="headerlink" 
href="#jupyter-notebook-support" title="Link to this heading">¶</a></h2>
+<p>Ballista works well in Jupyter notebooks. DataFrames automatically render 
as formatted HTML tables when displayed
+in a notebook cell.</p>
+<section id="basic-usage">
+<h3>Basic Usage<a class="headerlink" href="#basic-usage" title="Link to this 
heading">¶</a></h3>
+<div class="highlight-python notranslate"><div 
class="highlight"><pre><span></span><span class="kn">from</span><span 
class="w"> </span><span class="nn">ballista</span><span class="w"> </span><span 
class="kn">import</span> <span class="n">BallistaSessionContext</span>
+
+<span class="c1"># Connect to a Ballista cluster</span>
+<span class="n">ctx</span> <span class="o">=</span> <span 
class="n">BallistaSessionContext</span><span class="p">(</span><span 
class="s2">&quot;df://localhost:50050&quot;</span><span class="p">)</span>
+
+<span class="c1"># Register a table</span>
+<span class="n">ctx</span><span class="o">.</span><span 
class="n">register_parquet</span><span class="p">(</span><span 
class="s2">&quot;trips&quot;</span><span class="p">,</span> <span 
class="s2">&quot;/path/to/nyctaxi.parquet&quot;</span><span class="p">)</span>
+
+<span class="c1"># Run a query - the result renders as an HTML table</span>
+<span class="n">ctx</span><span class="o">.</span><span 
class="n">sql</span><span class="p">(</span><span class="s2">&quot;SELECT * 
FROM trips LIMIT 10&quot;</span><span class="p">)</span>
+</pre></div>
+</div>
+<p>When a DataFrame is the last expression in a cell, Jupyter automatically 
calls its <code class="docutils literal notranslate"><span 
class="pre">_repr_html_()</span></code> method,
+which renders a styled table with:</p>
+<ul class="simple">
+<li><p>Formatted column headers</p></li>
+<li><p>Expandable cells for long text content</p></li>
+<li><p>Scrollable display for wide tables</p></li>
+</ul>
+</section>
+<section id="converting-results">
+<h3>Converting Results<a class="headerlink" href="#converting-results" 
title="Link to this heading">¶</a></h3>
+<p>DataFrames can be converted to various formats for further analysis:</p>
+<div class="highlight-python notranslate"><div 
class="highlight"><pre><span></span><span class="n">df</span> <span 
class="o">=</span> <span class="n">ctx</span><span class="o">.</span><span 
class="n">sql</span><span class="p">(</span><span class="s2">&quot;SELECT * 
FROM trips WHERE fare_amount &gt; 50&quot;</span><span class="p">)</span>
+
+<span class="c1"># Convert to Pandas DataFrame</span>
+<span class="n">pandas_df</span> <span class="o">=</span> <span 
class="n">df</span><span class="o">.</span><span 
class="n">to_pandas</span><span class="p">()</span>
+
+<span class="c1"># Convert to PyArrow Table</span>
+<span class="n">arrow_table</span> <span class="o">=</span> <span 
class="n">df</span><span class="o">.</span><span 
class="n">to_arrow_table</span><span class="p">()</span>
+
+<span class="c1"># Convert to Polars DataFrame</span>
+<span class="n">polars_df</span> <span class="o">=</span> <span 
class="n">df</span><span class="o">.</span><span 
class="n">to_polars</span><span class="p">()</span>
+
+<span class="c1"># Collect as PyArrow RecordBatches</span>
+<span class="n">batches</span> <span class="o">=</span> <span 
class="n">df</span><span class="o">.</span><span class="n">collect</span><span 
class="p">()</span>
+</pre></div>
+</div>
+</section>
+<section id="example-notebook-workflow">
+<h3>Example Notebook Workflow<a class="headerlink" 
href="#example-notebook-workflow" title="Link to this heading">¶</a></h3>
+<p>A typical notebook workflow might look like:</p>
+<div class="highlight-python notranslate"><div 
class="highlight"><pre><span></span><span class="c1"># Cell 1: Setup</span>
+<span class="kn">from</span><span class="w"> </span><span 
class="nn">ballista</span><span class="w"> </span><span 
class="kn">import</span> <span class="n">BallistaSessionContext</span>
+<span class="kn">from</span><span class="w"> </span><span 
class="nn">datafusion</span><span class="w"> </span><span 
class="kn">import</span> <span class="n">col</span><span class="p">,</span> 
<span class="n">lit</span>
+
+<span class="n">ctx</span> <span class="o">=</span> <span 
class="n">BallistaSessionContext</span><span class="p">(</span><span 
class="s2">&quot;df://localhost:50050&quot;</span><span class="p">)</span>
+<span class="n">ctx</span><span class="o">.</span><span 
class="n">register_parquet</span><span class="p">(</span><span 
class="s2">&quot;orders&quot;</span><span class="p">,</span> <span 
class="s2">&quot;/data/orders.parquet&quot;</span><span class="p">)</span>
+<span class="n">ctx</span><span class="o">.</span><span 
class="n">register_parquet</span><span class="p">(</span><span 
class="s2">&quot;customers&quot;</span><span class="p">,</span> <span 
class="s2">&quot;/data/customers.parquet&quot;</span><span class="p">)</span>
+
+<span class="c1"># Cell 2: Explore the data</span>
+<span class="n">ctx</span><span class="o">.</span><span 
class="n">sql</span><span class="p">(</span><span class="s2">&quot;SELECT * 
FROM orders LIMIT 5&quot;</span><span class="p">)</span>
+
+<span class="c1"># Cell 3: Run analysis</span>
+<span class="n">df</span> <span class="o">=</span> <span 
class="n">ctx</span><span class="o">.</span><span class="n">sql</span><span 
class="p">(</span><span class="s2">&quot;&quot;&quot;</span>
+<span class="s2">    SELECT</span>
+<span class="s2">        c.name,</span>
+<span class="s2">        COUNT(*) as order_count,</span>
+<span class="s2">        SUM(o.amount) as total_spent</span>
+<span class="s2">    FROM orders o</span>
+<span class="s2">    JOIN customers c ON o.customer_id = c.id</span>
+<span class="s2">    GROUP BY c.name</span>
+<span class="s2">    ORDER BY total_spent DESC</span>
+<span class="s2">    LIMIT 10</span>
+<span class="s2">&quot;&quot;&quot;</span><span class="p">)</span>
+<span class="n">df</span>
+
+<span class="c1"># Cell 4: Convert to Pandas for visualization</span>
+<span class="kn">import</span><span class="w"> </span><span 
class="nn">matplotlib.pyplot</span><span class="w"> </span><span 
class="k">as</span><span class="w"> </span><span class="nn">plt</span>
+
+<span class="n">pandas_df</span> <span class="o">=</span> <span 
class="n">df</span><span class="o">.</span><span 
class="n">to_pandas</span><span class="p">()</span>
+<span class="n">pandas_df</span><span class="o">.</span><span 
class="n">plot</span><span class="p">(</span><span class="n">kind</span><span 
class="o">=</span><span class="s1">&#39;bar&#39;</span><span class="p">,</span> 
<span class="n">x</span><span class="o">=</span><span 
class="s1">&#39;name&#39;</span><span class="p">,</span> <span 
class="n">y</span><span class="o">=</span><span 
class="s1">&#39;total_spent&#39;</span><span class="p">)</span>
+<span class="n">plt</span><span class="o">.</span><span 
class="n">show</span><span class="p">()</span>
+</pre></div>
+</div>
+</section>
+<section id="running-a-local-cluster-in-a-notebook">
+<h3>Running a Local Cluster in a Notebook<a class="headerlink" 
href="#running-a-local-cluster-in-a-notebook" title="Link to this 
heading">¶</a></h3>
+<p>For development and testing, you can start a local cluster directly from a 
notebook:</p>
+<div class="highlight-python notranslate"><div 
class="highlight"><pre><span></span><span class="kn">from</span><span 
class="w"> </span><span class="nn">ballista</span><span class="w"> </span><span 
class="kn">import</span> <span class="n">BallistaSessionContext</span><span 
class="p">,</span> <span class="n">setup_test_cluster</span>
+
+<span class="c1"># Start a local scheduler and executor</span>
+<span class="n">host</span><span class="p">,</span> <span 
class="n">port</span> <span class="o">=</span> <span 
class="n">setup_test_cluster</span><span class="p">()</span>
+
+<span class="c1"># Connect to it</span>
+<span class="n">ctx</span> <span class="o">=</span> <span 
class="n">BallistaSessionContext</span><span class="p">(</span><span 
class="sa">f</span><span class="s2">&quot;df://</span><span 
class="si">{</span><span class="n">host</span><span class="si">}</span><span 
class="s2">:</span><span class="si">{</span><span class="n">port</span><span 
class="si">}</span><span class="s2">&quot;</span><span class="p">)</span>
+</pre></div>
+</div>
+</section>
+</section>
 <section id="user-defined-functions">
 <h2>User Defined Functions<a class="headerlink" href="#user-defined-functions" 
title="Link to this heading">¶</a></h2>
 <p>The underlying DataFusion query engine supports Python UDFs but this 
functionality has not yet been implemented in


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(datafusion-ballista) branch asf-site updated: Publish built docs triggered by caea92969f34a0c0899a6eb13713b9f7e377f80d

Reply via email to