Re: [basex-talk] basex OOM on 30GB database upon running /dba/db-optimize/

Imsieke, Gerrit, le-tex Thu, 03 Oct 2019 00:52:32 -0700

Hi,

just saying that 16 GB of DDR3 RAM cost about 40 € now.


Gerrit

On 03.10.2019 08:53, first name last name wrote:

I tried again, using SPLITSIZE = 12 in the .basex config file
The batch(console) script I used is attached mass-import.xq

This time I didn't do the optimize or index creation post-import, butinstead, I did it as part of the import similar to what

is described in [4].

This time I got a different error, that is,"org.basex.core.BaseXException: Out of Main Memory.*"*So right now.. I'm a bit out of ideas. Would AUTOOPTIMIZE make anydifference here?


Thanks

[4] http://docs.basex.org/wiki/Indexes#Performance

On Wed, Oct 2, 2019 at 11:06 AM first name last name<randomcod...@gmail.com <mailto:randomcod...@gmail.com>> wrote:


    Hey Christian,

    Thank you for your answer :)
    I tried setting in .basex the SPLITSIZE = 24000 but I've seen the
    same OOM behavior. It looks like the memory consumption is moderate
    until when it reaches about 30GB (the size of the db before
    optimize) and
    then memory consumption spikes, and OOM occurs. Now I'm trying with
    SPLITSIZE = 1000 and will report back if I get OOM again.
    Regarding what you said, it might be that the merge step is where
    the OOM occurs (I wonder if there's any way to control how much
    memory is being used inside the merge step).

    To quote the statistics page from the wiki:
    Databases <http://docs.basex.org/wiki/Databases> in BaseX are
    light-weight. If a database limit is reached, you can distribute
    your documents across multiple database instances and access all of
    them with a single XQuery expression.
    This to me sounds like sharding. I would probably be able to split
    the documents into chunks and upload them under a db with the same
    prefix, but varying suffix.. seems a lot like shards. By doing this
    I think I can avoid OOM, but if BaseX provides other, better, maybe
    native mechanisms of avoiding OOM, I would try them.

    Best regards,
    Stefan


    On Tue, Oct 1, 2019 at 4:22 PM Christian Grün
    <christian.gr...@gmail.com <mailto:christian.gr...@gmail.com>> wrote:

        Hi first name,

        If you optimize your database, the indexes will be rebuilt. In this
        step, the builder tries to guess how much free memory is still
        available. If memory is exhausted, parts of the index will be split
        (i. e., partially written to disk) and merged in a final step.
        However, you can circumvent the heuristics by manually assigning a
        static split value; see [1] for more information. If you use the
        DBA,
        you’ll need to assign this value to your .basex or the web.xml file
        [2]. In order to find the best value for your setup, it may be
        easier
        to play around with the BaseX GUI.

        As you have already seen in our statistics, an XML document has
        various properties that may represent a limit for a single database.
        Accordingly, these properties make it difficult to decide for the
        system when the memory will be exhausted during an import or index
        rebuild.

        In general, you’ll get best performance (and your memory consumption
        will be lower) if you create your database and specify the data
        to be
        imported in a single run. This is currently not possible via the
        DBA;
        use the GUI (Create Database) or console mode (CREATE DB command)
        instead.

        Hope this helps,
        Christian

        [1] http://docs.basex.org/wiki/Options#SPLITSIZE
        [2] http://docs.basex.org/wiki/Configuration



        On Mon, Sep 30, 2019 at 7:09 AM first name last name
        <randomcod...@gmail.com <mailto:randomcod...@gmail.com>> wrote:
         >
         > Hi,
         >
         > Let's say there's a 30GB dataset [3] containing most
        threads/posts from [1].
         > After importing all of it, when I try to run
        /dba/db-optimize/ on it (which must have some corresponding
        command) I get the OOM error in the stacktrace attached. I am
        using -Xmx2g so BaseX is limited to 2GB of memory (the machine
        I'm running this on doesn't have a lot of memory).
         > I was looking at [2] for some estimates of peak memory usage
        for this "db-optimize" operation, but couldn't find any.
         > Actually it would be nice to know peak memory usage because..
        of course, for any database (including BaseX) a common operation
        is to do server sizing, to know what kind of server would be needed.
         > In this case, it seems like 2GB memory is enough to import
        340k documents, weighing in at 30GB total, but it's not enough
        to run "dba-optimize".
         > Is there any info about peak memory usage on [2] ? And are
        there guidelines for large-scale collection imports like I'm
        trying to do?
         >
         > Thanks,
         > Stefan
         >
         > [1] https://www.linuxquestions.org/
         > [2] http://docs.basex.org/wiki/Statistics
         > [3]
        https://drive.google.com/open?id=1lTEGA4JqlhVf1JsMQbloNGC-tfNkeQt2


--
Gerrit Imsieke
Geschäftsführer / Managing Director
le-tex publishing services GmbH
Weissenfelser Str. 84, 04229 Leipzig, Germany
Phone +49 341 355356 110, Fax +49 341 355356 510
gerrit.imsi...@le-tex.de, http://www.le-tex.de

Registergericht / Commercial Register: Amtsgericht Leipzig
Registernummer / Registration Number: HRB 24930

Geschäftsführer / Managing Directors:
Gerrit Imsieke, Svea Jelonek, Thomas Schmidt

Re: [basex-talk] basex OOM on 30GB database upon running /dba/db-optimize/

Reply via email to