Hi Mikhail,
Your right - the use case of many smaller datasets isn't best served by
memory mapped mode.
The mode of operation currently has to be set very early on a per JVM
basis and ideally to the JVMitself -Dtdb:fileMode=direct . This is
because TDB reads the setting rather early - there is no fundamental
reason for this and it could be done on a per dataset basis, it just isn't.
While the files show as 200MB that are sparse files. Linux will show 8M
files with "ls -l" but the directory, to "du -sh" is 208K. Sparse files
don't allocate all their space. OS/X seems to be difefrent - "du -sh"
reports the sum of the file sizes, but they are still sparse files and
don't consume all their disk space.
In theory, the index segment size is configurable (see
SystemTDB.SegmentSize) but it isn't tested for in the test suite.
Andy
On 12/09/11 18:13, Mikhail Sogrin wrote:
Hi,
With memory mapped TDB storage (default with 64-bit JVM), the initial size
of TDB store without any data at all is 200 MB, because most of index files
are 8 MB, and there's quite a number of them.
It may be a good number when loading big data sets, but is absolutely huge
if an user expects to load only a bit of data.
In comparison, direct file method (with 32-bit JVM) makes only 8 KB index
files resulting in only 200 KB usage for an empty database.
Is there a way to configure initial size of index files?
The only method I could think of was to set 'direct' method, create dataset,
close it, set method to 'mapped' and open dataset again. But it prints a
warning "System file mode already determined - setting it has no effect",
and yes, the second setting does not seem to have any effect.
Kind regards,
Mikhail