Hi Bram,

sounds like Christian is on vacation, so I try to chip in. BaseX is definitely 
designed with have multiple/many documents in a database as this is how most 
people structure their XML database. So having fewer databases with more 
documents seems like in general a better approach, more streamlined and 
therefore quite likely that you can benefit from BaseX performance tweaks. 
Also, keep in mind that even 40000 documents in a database is not that many. 
There are many people who would like to store all their documents in one 
database and the limit within BaseX is currently 2^29 documents, so quite a lot.

However, the reason why on this list it is often recommended to split up your 
databases into multiple databases is that BaseX locks at database level. So it 
depends quite a bit on what you do with your data? If all your documents are 
basically read only or you only add new ones at a nightly transaction, maybe it 
is good enough to use a single database. However, if you have regular updates 
it might be better to split up your data in a logical way. A way I have seen 
often is to have one database for regular updates, where your users update 
actual data they work with and one or more archive-like databases, where there 
usually is lots of information stored which is barely or never written. This 
way you can keep the database which is actively modified quite small.

Additionally, regarding the performance decrease with millions of databases I 
could imagine you are also hitting a filesystem limitation. Keep in mind that 
for each database BaseX creates a directory. Having millions of directories 
within one directory might be a performance problem for some file system, 
depending on which you use. So this might not even be a BaseX limitation.

Cheers
Dirk

Senacor Technologies Aktiengesellschaft - Sitz: Eschborn - Amtsgericht 
Frankfurt am Main - Reg.-Nr.: HRB 105546
Vorstand: Matthias Tomann, Marcus Purzer - Aufsichtsratsvorsitzender: Daniel 
Grözinger

-----Ursprüngliche Nachricht-----
Von: basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] Im Auftrag von 
bram.van...@ugent.be
Gesendet: Donnerstag, 24. August 2017 23:45
An: 'Christian Grün' <christian.gr...@gmail.com>
Cc: 'BaseX' <basex-talk@mailman.uni-konstanz.de>
Betreff: Re: [basex-talk] Could not reserve enough space for object heap

Hi Christian

I am aware that you indicated that BaseX was not intended nor optimised for 
having so many databases, but (out of curiosity and interest) I am asking why a 
server instance has difficulty with so many databases. What is it that the 
server does when creating a new database, that causes it to be so slow when 
dealing with many databases? As I said before, it sounds as if it does some 
sort of sorting or indexing on EACH creation of a new database, which makes the 
import slow? Or am I wrong? If so, what is it exactly that a BaseX server 
instance does to its existing databases when it creates a new database?

In our case, each database only consists of a single document, a very small XML 
file of < 1 MB. There are millions of these databases.

Don't get me wrong, I  definitely agree that we should look for a better way to 
organize our data. Just to get an idea, is there a 'limit' of the amount of 
documents you'd add to a database to allow for good performance? In other 
words, would it be fruitful if we could split our millions of databases in a 
much smaller amount of databases with many documents per database? E.g. let's 
say we now have one million databases (one tiny XML file per db), would it seem 
a better idea to you to instead have only 25 databases, with 40.000 tiny XML 
documents in each? Or is that question hard to answer without real data?


Thanks for your patience

Bram



-----Oorspronkelijk bericht-----
Van: Christian Grün [mailto:christian.gr...@gmail.com]
Verzonden: donderdag 24 augustus 2017 10:19
Aan: Bram Vanroy <bram.van...@ugent.be>
CC: BaseX <basex-talk@mailman.uni-konstanz.de>
Onderwerp: Re: [basex-talk] Could not reserve enough space for object heap

Hi Bram,

As indicated in a previous reply, BaseX was not optimized to organize such a 
great number of databases. The best solution will be to distribute your 
documents to a smaller number of databases.

How many documents are stored in average in your databases, and how large are 
the documents?

Cheers,
C.



On Thu, Aug 24, 2017 at 8:46 AM,  <bram.van...@ugent.be> wrote:
> I can confirm that explicitly using a 64 bit JVM does not solve the issue I 
> mentioned in my first email.
>
> The problem persists; when working with a single server instance the creation 
> of very small databases (~1-5kb) goes smoothly at the beginning but turns 
> into a nightmare when reaching a certain threshold. Let's say around ~1M 
> databases, the time to create subsequent databases increases dramatically. 
> We're talking about a 10x duration increase.
>
> At first I thought I was doing something wrong, but for the life of me I 
> can't figure out what! The java process doesn't utilize any more or less 
> resources than it does when just starting, so that's not it either. Do you 
> have any information that might help here? Does a server do anything to the 
> databases when creating new ones? E.g. when one creates a new database, is a 
> list of all databases re-indexed or something like that? This could explain 
> the issue. But other inspiration is welcome as well, because I don't know 
> where to look.
>
> Thanks in advance
>
> -----Oorspronkelijk bericht-----
> Van: Christian Grün [mailto:christian.gr...@gmail.com]
> Verzonden: woensdag 16 augustus 2017 11:31
> Aan: Bram Vanroy <bram.van...@ugent.be>
> CC: BaseX <basex-talk@mailman.uni-konstanz.de>
> Onderwerp: Re: [basex-talk] Could not reserve enough space for object
> heap
>
>> I am running 64 bit, a x86-x64 JVM so I'd thought I could assign a
>> larger amount of memory to it.
>
> Hm. You could try to specify -d64 to ensure that the 64bit JVM is used.
>
>> Why does it need that much memory to just launch? Is it trying to
>> load some thing of all the existing databases in memory?
>
> As the error is returned by Java, not by BaseX, it's most probably returnend 
> before any databases are touched. Maybe it helps to consult some other Java 
> forum pages and compare the configuration with yours?
>
> Hope this helps,
> Christian
>
>
>
>> Citeren Christian Grün <christian.gr...@gmail.com>:
>>
>>
>>> Hi Bram,
>>>
>>> As was indicated before, the JVM is restricted to appr. 1.4 -1.5,
>>> sometimes 1.6 GB on 32 bit Windows systems.
>>>
>>> Generally (and as you have impressively proven), it is indeed
>>> possible to have millions of databases in a single BaseX instance.
>>> In practice, it is recommendable to find a tradeoff between number
>>> of databases and number of documents/XML nodes per database. If
>>> there is no obvious way how to distribute your documents, an
>>> additional mapping database might be reasonable in order to look up where a 
>>> document has gone.
>>>
>>> Hope this helps,
>>> Christian
>>>
>>>
>>> On Tue, Aug 15, 2017 at 10:22 PM, Bram Vanroy <bram.van...@ugent.be>
>>> wrote:
>>>>
>>>> Hi all
>>>>
>>>> I'm running into an issue with many databases. I.e. one server
>>>> instance with millions of databases. When creating all of these, I
>>>> found that the more databases are included on the instance, the
>>>> slower further database generation got. For instance, I could see
>>>> in the logs that in the first <
>>>> 10.000 databases the creation happened smoothly with around 50ms
>>>> per file of 1-4kB. However, when having more and more databases for
>>>> this server instance, things got very slow: for an XML file of
>>>> 1-4kB the logs show ~600ms. This is terribly slow, as you can imagine.
>>>>
>>>> At first I thought something was wrong with my hardware, but I
>>>> checked on another system and the same issues arises. Then I
>>>> thought maybe Java is doing something strange, so I figured I'd
>>>> reboot and see if that cleared some stuff up. But now when I try to
>>>> launch 'basex' or 'basexserver', I get the following message:
>>>>
>>>>     Could not reserve enough space for 1433600KB object heap
>>>>
>>>> I googled the issue, and it was suggested that I added a JAVA
>>>> option to my system's variable (I'm on Windows 10 64 bit, BaseX
>>>> 8.6.4) indicating the memory it could use. I set that to 2048MB.
>>>> But still the same issue persists.
>>>>
>>>> I have contacted the list before, with issues of generating
>>>> millions of database with the same server instance, and this seems
>>>> another one related to the problem. I am no expert AT ALL, but
>>>> isn't it possible there is some sort of micro memory leak that only
>>>> becomes apparent when creating an amount of databases of this
>>>> magnitude? If not, other ideas are welcome as well.
>>>> At
>>>> least on how to get rid of the Java error mentioned above.
>>>>
>>>>
>>>> Kind regards
>>>>
>>>> Bram Vanroy
>>>>
>>
>>
>>
>

Reply via email to