This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datafusion.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new 9cde957da6 Publish built docs triggered by 
f10deb67a3d34943027f6043f07b2af31f60c014
9cde957da6 is described below

commit 9cde957da65b32a0fdedccc078e42eface050972
Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
AuthorDate: Tue Aug 5 11:29:36 2025 +0000

    Publish built docs triggered by f10deb67a3d34943027f6043f07b2af31f60c014
---
 _sources/user-guide/configs.md.txt | 28 ++++++++++++++++++++++++++++
 index.html                         |  1 +
 searchindex.js                     |  2 +-
 user-guide/configs.html            | 33 +++++++++++++++++++++++++++++++++
 4 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/_sources/user-guide/configs.md.txt 
b/_sources/user-guide/configs.md.txt
index c817daad2c..3fc8e98437 100644
--- a/_sources/user-guide/configs.md.txt
+++ b/_sources/user-guide/configs.md.txt
@@ -192,3 +192,31 @@ The following runtime configuration settings are available:
 | datafusion.runtime.max_temp_directory_size | 100G    | Maximum temporary 
file directory size. Supports suffixes K (kilobytes), M (megabytes), and G 
(gigabytes). Example: '2G' for 2 gigabytes.    |
 | datafusion.runtime.memory_limit            | NULL    | Maximum memory limit 
for query execution. Supports suffixes K (kilobytes), M (megabytes), and G 
(gigabytes). Example: '2G' for 2 gigabytes. |
 | datafusion.runtime.temp_directory          | NULL    | The path to the 
temporary file directory.                                                       
                                            |
+
+# Tuning Guide
+
+## Short Queries
+
+By default DataFusion will attempt to maximize parallelism and use all cores --
+For example, if you have 32 cores, each plan will split the data into 32
+partitions. However, if your data is small, the overhead of splitting the data
+to enable parallelization can dominate the actual computation.
+
+You can find out how many cores are being used via the [`EXPLAIN`] command and 
look
+at the number of partitions in the plan.
+
+[`explain`]: sql/explain.md
+
+The `datafusion.optimizer.repartition_file_min_size` option controls the 
minimum file size the
+[`ListingTable`] provider will attempt to repartition. However, this
+does not apply to user defined data sources and only works when DataFusion has 
accurate statistics.
+
+If you know your data is small, you can set the 
`datafusion.execution.target_partitions`
+option to a smaller number to reduce the overhead of repartitioning. For very 
small datasets (e.g. less
+than 1MB), we recommend setting `target_partitions` to 1 to avoid 
repartitioning altogether.
+
+```sql
+SET datafusion.execution.target_partitions = '1';
+```
+
+[`listingtable`]: 
https://docs.rs/datafusion/latest/datafusion/datasource/listing/struct.ListingTable.html
diff --git a/index.html b/index.html
index 1659ac2b80..f87f732139 100644
--- a/index.html
+++ b/index.html
@@ -645,6 +645,7 @@ See the <a class="reference external" 
href="https://datafusion.apache.org/contri
 <li class="toctree-l1"><a class="reference internal" 
href="user-guide/sql/index.html">SQL Reference</a></li>
 <li class="toctree-l1"><a class="reference internal" 
href="user-guide/configs.html">Configuration Settings</a></li>
 <li class="toctree-l1"><a class="reference internal" 
href="user-guide/configs.html#runtime-configuration-settings">Runtime 
Configuration Settings</a></li>
+<li class="toctree-l1"><a class="reference internal" 
href="user-guide/configs.html#tuning-guide">Tuning Guide</a></li>
 <li class="toctree-l1"><a class="reference internal" 
href="user-guide/explain-usage.html">Reading Explain Plans</a></li>
 <li class="toctree-l1"><a class="reference internal" 
href="user-guide/faq.html">Frequently Asked Questions</a></li>
 <li class="toctree-l1"><a class="reference internal" 
href="user-guide/faq.html#how-does-datafusion-compare-with-xyz">How does 
DataFusion Compare with <code class="docutils literal notranslate"><span 
class="pre">XYZ</span></code>?</a></li>
diff --git a/searchindex.js b/searchindex.js
index 5a3649bdf8..d8193cdf49 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles":{"!=":[[56,"op-neq"]],"!~":[[56,"op-re-not-match"]],"!~*":[[56,"op-re-not-match-i"]],"!~~":[[56,"id19"]],"!~~*":[[56,"id20"]],"#":[[56,"op-bit-xor"]],"%":[[56,"op-modulo"]],"&":[[56,"op-bit-and"]],"(relation,
 name) tuples in logical fields and logical columns are 
unique":[[12,"relation-name-tuples-in-logical-fields-and-logical-columns-are-unique"]],"*":[[56,"op-multiply"]],"+":[[56,"op-plus"]],"-":[[56,"op-minus"]],"/":[[56,"op-divide"]],"<":[[56,"op-lt"]],"<
 [...]
\ No newline at end of file
+Search.setIndex({"alltitles":{"!=":[[56,"op-neq"]],"!~":[[56,"op-re-not-match"]],"!~*":[[56,"op-re-not-match-i"]],"!~~":[[56,"id19"]],"!~~*":[[56,"id20"]],"#":[[56,"op-bit-xor"]],"%":[[56,"op-modulo"]],"&":[[56,"op-bit-and"]],"(relation,
 name) tuples in logical fields and logical columns are 
unique":[[12,"relation-name-tuples-in-logical-fields-and-logical-columns-are-unique"]],"*":[[56,"op-multiply"]],"+":[[56,"op-plus"]],"-":[[56,"op-minus"]],"/":[[56,"op-divide"]],"<":[[56,"op-lt"]],"<
 [...]
\ No newline at end of file
diff --git a/user-guide/configs.html b/user-guide/configs.html
index 76e2adae5a..34ba535724 100644
--- a/user-guide/configs.html
+++ b/user-guide/configs.html
@@ -578,6 +578,18 @@
    Runtime Configuration Settings
   </a>
  </li>
+ <li class="toc-h1 nav-item toc-entry">
+  <a class="reference internal nav-link" href="#tuning-guide">
+   Tuning Guide
+  </a>
+  <ul class="visible nav section-nav flex-column">
+   <li class="toc-h2 nav-item toc-entry">
+    <a class="reference internal nav-link" href="#short-queries">
+     Short Queries
+    </a>
+   </li>
+  </ul>
+ </li>
 </ul>
 
 </nav>
@@ -1139,6 +1151,27 @@ example, to configure <code class="docutils literal 
notranslate"><span class="pr
 </tr>
 </tbody>
 </table>
+</section>
+<section id="tuning-guide">
+<h1>Tuning Guide<a class="headerlink" href="#tuning-guide" title="Link to this 
heading">¶</a></h1>
+<section id="short-queries">
+<h2>Short Queries<a class="headerlink" href="#short-queries" title="Link to 
this heading">¶</a></h2>
+<p>By default DataFusion will attempt to maximize parallelism and use all 
cores –
+For example, if you have 32 cores, each plan will split the data into 32
+partitions. However, if your data is small, the overhead of splitting the data
+to enable parallelization can dominate the actual computation.</p>
+<p>You can find out how many cores are being used via the <a class="reference 
internal" href="sql/explain.html"><span class="std std-doc"><code 
class="docutils literal notranslate"><span 
class="pre">EXPLAIN</span></code></span></a> command and look
+at the number of partitions in the plan.</p>
+<p>The <code class="docutils literal notranslate"><span 
class="pre">datafusion.optimizer.repartition_file_min_size</span></code> option 
controls the minimum file size the
+<a class="reference external" 
href="https://docs.rs/datafusion/latest/datafusion/datasource/listing/struct.ListingTable.html";><code
 class="docutils literal notranslate"><span 
class="pre">ListingTable</span></code></a> provider will attempt to 
repartition. However, this
+does not apply to user defined data sources and only works when DataFusion has 
accurate statistics.</p>
+<p>If you know your data is small, you can set the <code class="docutils 
literal notranslate"><span 
class="pre">datafusion.execution.target_partitions</span></code>
+option to a smaller number to reduce the overhead of repartitioning. For very 
small datasets (e.g. less
+than 1MB), we recommend setting <code class="docutils literal 
notranslate"><span class="pre">target_partitions</span></code> to 1 to avoid 
repartitioning altogether.</p>
+<div class="highlight-sql notranslate"><div 
class="highlight"><pre><span></span><span class="k">SET</span><span class="w"> 
</span><span class="n">datafusion</span><span class="p">.</span><span 
class="n">execution</span><span class="p">.</span><span 
class="n">target_partitions</span><span class="w"> </span><span 
class="o">=</span><span class="w"> </span><span 
class="s1">&#39;1&#39;</span><span class="p">;</span>
+</pre></div>
+</div>
+</section>
 </section>
 
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org

Reply via email to