Ok, so I tried to split the file into 256MB chunks. Now I'm getting: "chunk_3.xml" (Line 7988121): Too many distinct element names (limit: 32768). Which is actually true :-( The document has weird element names, like <i1>, <i2> ... and so on, up to <i50000> This may be related to the out of memory error, too. Is there a way to raise this limit? Thanks
On Thu, 27 Feb 2025 at 18:25, Csaba Fekete <feketecs...@gmail.com> wrote: > Yeah, I get the same error using this command, too. > Thanks > > On Thu, 27 Feb 2025 at 17:43, Christian Grün <christian.gr...@gmail.com> > wrote: > >> Just some quick feedback: Does it work if you specify the input along >> with CREATE DB? >> >> basex -c"CREATE DB taurus SPANYOLORSZÁG.xml" >> >> You can also specify a directory as input. >> >> Thanks, >> Christian >> >> >> >> Csaba Fekete <feketecs...@gmail.com> schrieb am Do., 27. Feb. 2025, >> 17:36: >> >>> Hi Christian >>> Sorry, I thought I was sending this to the mailing list. Thanks for >>> answering anyway! >>> Now I'm trying with a smaller dataset and I am adding the documents one >>> by one. I also upgraded BaseX to the latest version. >>> The largest document is 1151M in size and it can't be imported, even if >>> I use attrindex and textindex. >>> The file is actually publicly available: >>> http://taurusreisen.hu/partner/v2/SPANYOLORSZAG.zip >>> Here is my command and the output: >>> /opt/basex/bin/basex -Oattrindex=true -Otextindex=true -v -V -c"OPEN >>> taurus; ADD ./SPANYOLORSZÁG.xml" >>> Database 'taurus' was opened in 18.21 ms. >>> Out of Main Memory. >>> I am thinking of solving the problem by splitting the file to several >>> chunks, which will be CPU-demanding but could make it work. >>> Any ideas are welcome. >>> Thank you again, and a million thanks for BaseX! It is a fantastic tool. >>> Regards, >>> Csaba >>> >>> On Thu, 27 Feb 2025 at 15:52, Christian Grün <christian.gr...@gmail.com> >>> wrote: >>> >>>> Hi Csaba, >>>> >>>> It’s difficult to give a general advice; XML documents are just too >>>> different. In principle, a few GB or even MB can be sufficient to create >>>> databases for very large collections (10 GB and more), but sometimes >>>> namespaces are a showstopper. See [1] for some statistics. >>>> >>>> What’s the total size of your XML documents? Can you create the >>>> database if you enable the text and attribute index? >>>> >>>> Best, >>>> Christian >>>> >>>> [1] https://docs.basex.org/main/Statistics >>>> >>>> >>>> >>>> >>>> On Tue, Feb 25, 2025 at 2:10 PM Csaba Fekete <feketecs...@gmail.com> >>>> wrote: >>>> >>>>> Hi >>>>> I have a web server that runs Basex 11.1. The server is a VPS with 18G >>>>> of RAM. >>>>> I have a directory of documents in various sizes, ranging from a few >>>>> kilobytes up to 2G. >>>>> I am trying to import these documents with the command >>>>> CREATE DB mydb /path/to/docs >>>>> With the default jvm max heap size (2GB) I get the error: Out of main >>>>> memory >>>>> If I raise the max heap size to 4GB, I get the same error. >>>>> If I raise it to 8GB, the system becomes unresponsive. >>>>> How can I determine how much system memory I need to be able to carry >>>>> out this task? >>>>> Thanks >>>>> >>>>