I tried again, using SPLITSIZE = 12 in the .basex config file The batch(console) script I used is attached mass-import.xq This time I didn't do the optimize or index creation post-import, but instead, I did it as part of the import similar to what is described in [4]. This time I got a different error, that is, "org.basex.core.BaseXException: Out of Main Memory.*"* So right now.. I'm a bit out of ideas. Would AUTOOPTIMIZE make any difference here?
Thanks [4] http://docs.basex.org/wiki/Indexes#Performance On Wed, Oct 2, 2019 at 11:06 AM first name last name <randomcod...@gmail.com> wrote: > Hey Christian, > > Thank you for your answer :) > I tried setting in .basex the SPLITSIZE = 24000 but I've seen the same OOM > behavior. It looks like the memory consumption is moderate until when it > reaches about 30GB (the size of the db before optimize) and > then memory consumption spikes, and OOM occurs. Now I'm trying with > SPLITSIZE = 1000 and will report back if I get OOM again. > Regarding what you said, it might be that the merge step is where the OOM > occurs (I wonder if there's any way to control how much memory is being > used inside the merge step). > > To quote the statistics page from the wiki: > Databases <http://docs.basex.org/wiki/Databases> in BaseX are > light-weight. If a database limit is reached, you can distribute your > documents across multiple database instances and access all of them with a > single XQuery expression. > This to me sounds like sharding. I would probably be able to split the > documents into chunks and upload them under a db with the same prefix, but > varying suffix.. seems a lot like shards. By doing this > I think I can avoid OOM, but if BaseX provides other, better, maybe native > mechanisms of avoiding OOM, I would try them. > > Best regards, > Stefan > > > On Tue, Oct 1, 2019 at 4:22 PM Christian Grün <christian.gr...@gmail.com> > wrote: > >> Hi first name, >> >> If you optimize your database, the indexes will be rebuilt. In this >> step, the builder tries to guess how much free memory is still >> available. If memory is exhausted, parts of the index will be split >> (i. e., partially written to disk) and merged in a final step. >> However, you can circumvent the heuristics by manually assigning a >> static split value; see [1] for more information. If you use the DBA, >> you’ll need to assign this value to your .basex or the web.xml file >> [2]. In order to find the best value for your setup, it may be easier >> to play around with the BaseX GUI. >> >> As you have already seen in our statistics, an XML document has >> various properties that may represent a limit for a single database. >> Accordingly, these properties make it difficult to decide for the >> system when the memory will be exhausted during an import or index >> rebuild. >> >> In general, you’ll get best performance (and your memory consumption >> will be lower) if you create your database and specify the data to be >> imported in a single run. This is currently not possible via the DBA; >> use the GUI (Create Database) or console mode (CREATE DB command) >> instead. >> >> Hope this helps, >> Christian >> >> [1] http://docs.basex.org/wiki/Options#SPLITSIZE >> [2] http://docs.basex.org/wiki/Configuration >> >> >> >> On Mon, Sep 30, 2019 at 7:09 AM first name last name >> <randomcod...@gmail.com> wrote: >> > >> > Hi, >> > >> > Let's say there's a 30GB dataset [3] containing most threads/posts from >> [1]. >> > After importing all of it, when I try to run /dba/db-optimize/ on it >> (which must have some corresponding command) I get the OOM error in the >> stacktrace attached. I am using -Xmx2g so BaseX is limited to 2GB of memory >> (the machine I'm running this on doesn't have a lot of memory). >> > I was looking at [2] for some estimates of peak memory usage for this >> "db-optimize" operation, but couldn't find any. >> > Actually it would be nice to know peak memory usage because.. of >> course, for any database (including BaseX) a common operation is to do >> server sizing, to know what kind of server would be needed. >> > In this case, it seems like 2GB memory is enough to import 340k >> documents, weighing in at 30GB total, but it's not enough to run >> "dba-optimize". >> > Is there any info about peak memory usage on [2] ? And are there >> guidelines for large-scale collection imports like I'm trying to do? >> > >> > Thanks, >> > Stefan >> > >> > [1] https://www.linuxquestions.org/ >> > [2] http://docs.basex.org/wiki/Statistics >> > [3] https://drive.google.com/open?id=1lTEGA4JqlhVf1JsMQbloNGC-tfNkeQt2 >> >
.;...;..;..;..;..;...;..;..;...;..;..;..;...;..;..;..;...;..;..;..;...;..;..;...;..;..;..;...;..;..;..;...;..;..;..;...;..;..;..;...;..;..;..;...;..;..;..;...;..;..;...;..;..;..;...;..;..;..;...;..;..;..;...;..; ..;..;...;..;..;..;...;..;..;..;...;..;..;..;...;..;..;...;..;..;..;...;..;..;..;...;..;..;..;...;..;..;..;...;..;..;..;...;..;..;...;..;..;..;...;..;..;..;...;..;..;..;...;..;..;..;...;..;..;..;...;..;..;..;... ;..;..;..;...;..;..;..;...;..;..;..;...;..;..;..;...;..;..;..;...;..;..;...;..;..;..;...;..;..;..;...;..;..;..;...;..;..;..;...;..;..;..;..;...;..;..;...;..;..;..;...;..;..;..;...;..;..;..;...;..;..;..;...;..;.. ;..;...;..;..;..;...;..;..;..;...;..;..;...;..;..;..;...;..;..;..;...;..;..;..;...;..;..;...;..;..;..;...;..;..;..;...;..;..;..;...;..;..;..;...;..;..;..;...;..;..;..;...;..;..;..;...;..;..;..;...;..;..;..;...;. .;..;...;..;..;..;...;..;..;..;...;..;..;..;...;..;..;..;..;...;..;..;..;...;..;..;...;..;..;..;...;..;..;..;...;..;..;..;...;..;..;...;..;..;..;...;..;..;..;...;..;..;...;..;..;..;...;..;..;..;...;..;..;..;...; ..;..;...;..;..;..;... 1.659934928E7 ms (657 MB) Indexing Text... .....|....|....|....|....|....|....|....|.....|....|....|....|....|....|....|....|.....|....|....|....|....|....|....|....|....|.....|....|....|....|....|....|....|....|.....|....|....|....|....|....|....|....|. ...|.....|....|....|....|....|....|....|....|....|.....|....|....|....|....|....|....|....|....|.....|....|....|....|....|....|....|....|.....|....|....|....|....|....|....|....|....|.....|....|....|....|....|.. ..|....|....|....|.....|....|....|....|....|....|....|....|....|.....|....|....|....|....|....|....|....|....|.....|....|....|....|....|....|....|....|.....|....|....|....|....|....|....|....|....|.....|....|... .|....|....|....|....|....|.....|....|....|....|.. 160.09 M operations, 642947.17 ms (438 MB). Indexing Attribute Values... ..|.|.|.|.|.|.|..|.|.|.|.|.|.|.|.|..|.|.|.|.|.|.|..|.|.|.|.|.|.|.|.|..|.|.|.|.|.|.|.|..|.|.|.|.|.|.|..|.|.|.|.|.|.|.|..|.|.|.|.|.|.|.|..|.|.|.|.|.|.|.|..|.|.|.|.|.|.|.|..|.|.|.|.|.|.|.|..|.|.|.|.|.|.|.|..|.|.|.| .|.|.|.|..|.|.|.|.|.|.|.|..|.|.|.|.|.|.|.|..|.|.|.|.|.|.|.|..|.|.|.|.|.|.|.|..|.|.|.|.|.|.|.|..|.|.|.|.|.|.|.|..|.|.|.|.|.|.|.|..|.|.|.|.|.|.|.|..|.|.|.|.|.|.|.|..|.|.|.|.|.|.|.|..|.|.|.|.|.|.|.|..|.|.|.|.|.|.|. |..|.|.|.|.|.|.|.|..|.|.|.|.|.|.|.|..|.|.|.|.|.|.|.|..|.|.|.|.|.|.|.|..|.|.|.|.|.|.|.|..|.|.|.|.|.|.|.|..|.|.|.|.|.|.|.|..|.|.|.|.|.|.|.|..|.|.|.|.|.|.|.|..|.|.|.|.|.|.|.|..|.|.|.|.|.|.|.|..|.|.|.|.|.|.|.|..|.|.|.|.|.|.|.|..|.|.|.|.|.|.|.|..|.|.|.|.|.|.|.|..|.|.|.|.|.|.|.|..|.|.|.|.|.|.|.|..|.|.|.|.|.|.|.|..|.|.|.|.|.|.|.|..|.|.|.|.|.|.|.|..|.|.|.|.|.|.|.|..|.|.|.|.|.|.|.|..|.|.|.|.|.|.|.|..|.|.|.|.|.|.|.|..|.|.|.|.|.| .|.|..|.|.|.|.|.|.|.|..|.|.|.|.|.|.|.|..|.|.|.|.|.|.|.|..|.|.|.|.|.|.|.|..|.|.|.|.|.|.|.|..|.|.|.|.|.|.|.|..|.|.|.|.|.|.|.|..|.|.|.|.|.|.|.|..|.|.|.|.|.|.|.|..|.|.|.|.|.|.|.|..|.|.|.|.|.|.|.|.. 585.55 M operatio ns, 2768524.64 ms (430 MB). Indexing Tokens... .|.|.|.|.|.|.||.|.|.|.|.|.|.||.|.|.|.|.|.|.||.|.|.|.|.|.||.|.|.|.|.|.|.||.|.|.|.|.|.||.|.|.|.|.|.|.||.|.|.|.|.|.|.||.|.|.|.|.|.||.|.|.|.|.|.|.||.|.|.|.|.|.||.|.|.|.|.|.|.||.|.|.|.|.|.|.||.|.|.|.|.|.||.|.|.|.|.|. |.||.|.|.|.|.|.|.||.|.|.|.|.|.|.||.|.|.|.|.|.||.|.|.|.|.|.|.||.|.|.|.|.|.|.||.|.|.|.|.|.||.|.|.|.|.|.|.||.|.|.|.|.|.|.||.|.|.|.|.|.||.|.|.|.|.|.|.||.|.|.|.|.|.||.|.|.|.|.|.|.||.|.|.|.|.|.|.||.|.|.|.|.|.||.|.|.|. |.|.|.||.|.|.|.|.|.|.||.|.|.|.|.|.|.||.|.|.|.|.|.||.|.|.|.|.|.|.||.|.|.|.|.|.||.|.|.|.|.|.|.||.|.|.|.|.|.|.||.|.|.|.|.|.||.|.|.|.|.|.|.||.|.|.|.|.|.|.||.|.|.|.|.|.||.|.|.|.|.|.|.||.|.|.|.|.|.|.||.|.|.|.|.|.||.|. |.|.|.|.|.||.|.|.|.|.|.||.|.|.|.|.|.|.||.|.|.|.|.|.|.||.|.|.|.|.|.|.||.|.|.|.|.|.||.|.|.|.|.|.|.||.|.|.|.|.|.|.||.|.|.|.|.|.||.|.|.|.|.|.|.||.|.|.|.|.|.||.|.|.|.|.|.|.||.|.|.|.|.|.|.||.|.|.|.|.|.|.||.|.|.|.|.|.| |.|.|.|.|.|.|.||.|.|.|.|.|.|.||.|.|.|.|.|.||.|.|.|.|.|.|.||.|.|.|.|.|.|.||.|.|.|.|.|.||.|.|.|.|.|.|.||.|.|.|.|.|.|.||.|.|.|.|.|.||.|.|.|.|.|.|.||.|.|.|.|.|.|.||.|.|.|.|.|.||.|.|.|.|.|.|.||.|.|.|.|.|.|.||.|.|.|.| .|.||.|.|.|.|.|.|.||.|.|.|.|.|.||.|.|.|.|.|.|.||.|.|.|.|.|.||.|.|.|.|.|.|.||.|.|.|.|.|.|.||.|.|.|.|.|.||.|.|.|.|.|.|.||.|.|.|. 757.99 M operations, 4394525.69 ms (647 MB). Indexing Full-Text... .........|.........|........|.........|.........|........|.........|........|.........|.........|........|.........|........|.........|.........|........|.........|........|.........|.........|........|......... |.........|........|.........|........|.........|.........|........|.........|........|.........|.........|........|.........|.........|........|.........|........|.........|.........|........|.........|........ |.........|.........|........|.........|........|.........|.........|........|.........|........|.........|.........|........|.........|........|.........|........|.........|.........|.......java.lang.OutOfMemor yError: Java heap space at java.util.Arrays.copyOf(Arrays.java:3236) at org.basex.util.list.ByteList.add(ByteList.java:67) at org.basex.util.list.ByteList.add(ByteList.java:55) at org.basex.index.ft.FTBuilder.merge(FTBuilder.java:236) at org.basex.index.ft.FTBuilder.write(FTBuilder.java:147) at org.basex.index.ft.FTBuilder.build(FTBuilder.java:86) at org.basex.index.ft.FTBuilder.build(FTBuilder.java:1) at org.basex.data.DiskData.createIndex(DiskData.java:198) at org.basex.core.cmd.CreateIndex.create(CreateIndex.java:100) at org.basex.core.cmd.CreateIndex.create(CreateIndex.java:88) at org.basex.core.cmd.CreateDB$1.run(CreateDB.java:116) at org.basex.core.cmd.ACreate.update(ACreate.java:90) at org.basex.core.cmd.CreateDB.run(CreateDB.java:113) at org.basex.core.Command.run(Command.java:257) at org.basex.core.Command.execute(Command.java:93) at org.basex.server.ClientListener.run(ClientListener.java:140) org.basex.core.BaseXException: Out of Main Memory. You can increase Java's heap size with the flag -Xmx<size>. at org.basex.core.Command.execute(Command.java:94) at org.basex.server.ClientListener.run(ClientListener.java:140)
mass-import.xq
Description: Binary data