(jena-site) branch asf-staging updated: Staged site from tdb-faqs (c375f3c36c1a85ff3317845e9ae2f87e4c2e7259)

git-site-role Tue, 11 Feb 2025 01:51:20 -0800

This is an automated email from the ASF dual-hosted git repository.

git-site-role pushed a commit to branch asf-staging
in repository https://gitbox.apache.org/repos/asf/jena-site.git



The following commit(s) were added to refs/heads/asf-staging by this push:
     new 2d3f4f5f5 Staged site from tdb-faqs 
(c375f3c36c1a85ff3317845e9ae2f87e4c2e7259)
2d3f4f5f5 is described below

commit 2d3f4f5f58afbe0544d5411378a996ad1acc49b7
Author: jenkins <[email protected]>
AuthorDate: Tue Feb 11 09:51:09 2025 +0000

    Staged site from tdb-faqs (c375f3c36c1a85ff3317845e9ae2f87e4c2e7259)
---
 content/documentation/tdb/faqs.html | 40 ++++++++++++++++++++++++-------------
 content/index.json                  |  2 +-
 content/sitemap.xml                 |  4 ++--
 3 files changed, 29 insertions(+), 17 deletions(-)

diff --git a/content/documentation/tdb/faqs.html 
b/content/documentation/tdb/faqs.html
index 39fe69def..ab8044615 100644
--- a/content/documentation/tdb/faqs.html
+++ b/content/documentation/tdb/faqs.html
@@ -334,29 +334,41 @@ journal can be flushed by closing any datasets and 
releasing the TDB resources.<
   }
 </code></pre>
 <h3 id="input-vs-database-size">Why is the database much larger on disk than 
my input data?</h3>
-<p>TDB2 uses copy-on-write data structures.  This means that each new write 
transaction takes copies of any data blocks it
-modifies during the transaction and writes new copies of those blocks with the 
required modifications.  The old blocks
-are not automatically removed as they might still be referenced by ongoing 
read transactions.  Depending on how you&rsquo;ve
-loaded your data into TDB2 - how many transactions were used, how large each 
transaction was, input data characteristics
-etc. - this can lead to much larger database disk size than your original 
input data size.</p>
+<p>Firstly, TDB2 uses copy-on-write data structures.  This means that each new 
write transaction takes copies of any data
+blocks it modifies during the transaction and writes new copies of those 
blocks with the required modifications.  The
+old blocks are not automatically removed as they might still be referenced by 
ongoing read transactions.  Depending on
+how you&rsquo;ve loaded your data into TDB2 - how many transactions were used, 
how large each transaction was, whether named
+graphs are used, input data characteristics etc. - this can lead to much 
larger database disk size than your original
+input data size.</p>
+<p>Secondly it is also worth noting that both TDB and TDB2 use <a 
href="https://en.wikipedia.org/wiki/Sparse_file";>sparse files</a>
+for their on disk storage.  Depending on the file system and operating system 
you are using, and the tools you use to
+inspect it, you may see larger sizes reported than are actually being consumed 
e.g.</p>
+<pre tabindex="0"><code>$ ls -lh SPOG.idn
+-rw-r--r--  1 user  group   8.0M 23 Sep 15:23 SPOG.idn
+$ du -h SPOG.idn
+6.1M   SPOG.idn
+</code></pre><p>In the above example, on a small toy dataset, we can see that 
<code>ls</code> reports a file size as <code>8.0M</code> while <code>du</code> 
reports a
+file size of <code>6.1M</code>.  Since a database is comprised of many files 
the total logical size vs total physical size may be
+quite different.</p>
 <p>You can run a <a href="../tdb2/tdb2_admin.md#compaction">Compaction</a> 
operation on your database to have TDB2 prune the data
-structures to only preserve the current data blocks.  Compactions require 
exclusive write access to the database i.e. no
-other read/write transactions may occur while a compaction is running.  Thus, 
compactions should generally be run
+structures to only preserve the current data blocks.  Compactions require 
exclusive write access to the database, i.e.
+no other read/write transactions may occur while a compaction is running.  
Thus, compactions should generally be run
 offline, or at quiet times if exposing your database to multiple applications 
per <a href="#multi-jvm">Can I share a TDB dataset between
 multiple applications?</a>.</p>
+<p><strong>NB</strong> If you loaded your data using one of the TDB bulk 
loaders, e.g. <a href="#tdbloader-vs-tdbloader2"><code>tdbloader2</code></a> and
+<a href="#tdb-xloader"><code>xloader</code></a>, then those already generate a 
(near) maximally compacted database and compaction will offer
+little/no benefit!</p>
 <p>Please note that compaction creates a new <code>Data-NNNN</code> directory 
per <a href="../tdb2/tdb2_admin.md#tdb2-directory-layout">TDB2 Directory
 Layout</a> into which it writes the compacted copy of the database.  The old
 directory won&rsquo;t be automatically removed unless the compaction operation 
was explicitly configured to do so. Therefore,
 the immediate effect of a compaction may actually be more disk space usage 
until the old data directory can be removed.
 If the database was already maximally compacted then there will be no 
difference in size between the old and new data
 directories.</p>
-<p>We would recommend that you consider running a compaction after an initial 
bulk data load, although some bulk loading
-methods may already generate a maximally compacted database e.g. <a 
href="#tdbloader-vs-tdbloader2"><code>tdbloader2</code></a>.  Also, if
-your database has ongoing updates over time we would also recommend that you 
consider running a compaction periodically
-e.g. once a day/week etc. We cannot provide exact recommendations here as to 
the frequency of compactions you should run
-as how much disk size inflation you experience will vary depending on many 
factors - size and frequency of write
-transactions, data characteristics, etc. - and you will need to determine a 
suitable schedule based on your use case for
-database.</p>
+<p>If your database has ongoing updates over time, particularly spread across 
many separate transactions, we would
+recommend that you consider running a compaction periodically e.g. once a 
day/week etc. We cannot provide exact
+recommendations here as to the frequency of compactions you should run as how 
much disk size inflation you experience
+will vary depending on many factors - size and frequency of write 
transactions, data characteristics, etc. - and you
+will need to determine a suitable schedule based on your use case for 
database.</p>
 <p>Note also that if running on Windows then it won&rsquo;t be possible to 
delete the old data directory due a OS limitation, see
 <a href="#windows-dataset-delete">Why can&rsquo;t I delete a dataset (MS 
Windows/64 bit)?</a>.</p>
 <h3 id="ssd">Should I use a SSD?</h3>
diff --git a/content/index.json b/content/index.json
index c93b11beb..83dfdbfa0 100644
--- a/content/index.json
+++ b/content/index.json
@@ -1 +1 @@
-[{"categories":null,"contents":"This page is historical \u0026ldquo;for 
information only\u0026rdquo; - there is no Apache release of Eyeball and the 
code has not been updated for Jena3.\nThe original source code is available. So 
you\u0026rsquo;ve got Eyeball installed and you\u0026rsquo;ve run it on one of 
your files, and Eyeball doesn\u0026rsquo;t like it. You\u0026rsquo;re not sure 
why, or what to do about it. Here\u0026rsquo;s what\u0026rsquo;s going 
on.\nEyeball inspects your model a [...]
\ No newline at end of file
+[{"categories":null,"contents":"This page is historical \u0026ldquo;for 
information only\u0026rdquo; - there is no Apache release of Eyeball and the 
code has not been updated for Jena3.\nThe original source code is available. So 
you\u0026rsquo;ve got Eyeball installed and you\u0026rsquo;ve run it on one of 
your files, and Eyeball doesn\u0026rsquo;t like it. You\u0026rsquo;re not sure 
why, or what to do about it. Here\u0026rsquo;s what\u0026rsquo;s going 
on.\nEyeball inspects your model a [...]
\ No newline at end of file
diff --git a/content/sitemap.xml b/content/sitemap.xml
index 541ea71af..816a856ea 100644
--- a/content/sitemap.xml
+++ b/content/sitemap.xml
@@ -209,7 +209,7 @@
     <lastmod>2023-04-09T15:11:22+02:00</lastmod>
   </url><url>
     <loc>https://jena.apache.org/documentation.html</loc>
-    <lastmod>2025-02-10T11:47:30+00:00</lastmod>
+    <lastmod>2025-02-11T09:38:10+00:00</lastmod>
   </url><url>
     <loc>https://jena.apache.org/download.html</loc>
     <lastmod>2025-01-21T15:04:14+00:00</lastmod>
@@ -625,7 +625,7 @@
     <lastmod>2024-03-28T22:35:37+01:00</lastmod>
   </url><url>
     <loc>https://jena.apache.org/documentation/tdb/faqs.html</loc>
-    <lastmod>2025-02-10T11:47:30+00:00</lastmod>
+    <lastmod>2025-02-11T09:38:10+00:00</lastmod>
   </url><url>
     <loc>https://jena.apache.org/documentation/tdb/java_api.html</loc>
     <lastmod>2024-03-28T22:35:37+01:00</lastmod>

(jena-site) branch asf-staging updated: Staged site from tdb-faqs (c375f3c36c1a85ff3317845e9ae2f87e4c2e7259)

Reply via email to