Ok, so I tried to split the file into 256MB chunks. Now I'm getting:
"chunk_3.xml" (Line 7988121): Too many distinct element names (limit:
32768).
Which is actually true :-( The document has weird element names, like <i1>,
<i2> ... and so on, up to <i50000>
This may be related to the out of memory error, too.
Is there a way to raise this limit?
Thanks


On Thu, 27 Feb 2025 at 18:25, Csaba Fekete <feketecs...@gmail.com> wrote:

> Yeah, I get the same error using this command, too.
> Thanks
>
> On Thu, 27 Feb 2025 at 17:43, Christian Grün <christian.gr...@gmail.com>
> wrote:
>
>> Just some quick feedback: Does it work if you specify the input along
>> with CREATE DB?
>>
>> basex -c"CREATE DB taurus SPANYOLORSZÁG.xml"
>>
>> You can also specify a directory as input.
>>
>> Thanks,
>> Christian
>>
>>
>>
>> Csaba Fekete <feketecs...@gmail.com> schrieb am Do., 27. Feb. 2025,
>> 17:36:
>>
>>> Hi Christian
>>> Sorry, I thought I was sending this to the mailing list. Thanks for
>>> answering anyway!
>>> Now I'm trying with a smaller dataset and I am adding the documents one
>>> by one. I also upgraded BaseX to the latest version.
>>> The largest document is 1151M in size and it can't be imported, even if
>>> I use attrindex and textindex.
>>> The file is actually publicly available:
>>> http://taurusreisen.hu/partner/v2/SPANYOLORSZAG.zip
>>> Here is my command and the output:
>>> /opt/basex/bin/basex -Oattrindex=true -Otextindex=true -v -V -c"OPEN
>>> taurus; ADD ./SPANYOLORSZÁG.xml"
>>> Database 'taurus' was opened in 18.21 ms.
>>> Out of Main Memory.
>>> I am thinking of solving the problem by splitting the file to several
>>> chunks, which will be CPU-demanding but could make it work.
>>> Any ideas are welcome.
>>> Thank you again, and a million thanks for BaseX! It is a fantastic tool.
>>> Regards,
>>> Csaba
>>>
>>> On Thu, 27 Feb 2025 at 15:52, Christian Grün <christian.gr...@gmail.com>
>>> wrote:
>>>
>>>> Hi Csaba,
>>>>
>>>> It’s difficult to give a general advice; XML documents are just too
>>>> different. In principle, a few GB or even MB can be sufficient to create
>>>> databases for very large collections (10 GB and more), but sometimes
>>>> namespaces are a showstopper. See [1] for some statistics.
>>>>
>>>> What’s the total size of your XML documents? Can you create the
>>>> database if you enable the text and attribute index?
>>>>
>>>> Best,
>>>> Christian
>>>>
>>>>  [1] https://docs.basex.org/main/Statistics
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Feb 25, 2025 at 2:10 PM Csaba Fekete <feketecs...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi
>>>>> I have a web server that runs Basex 11.1. The server is a VPS with 18G
>>>>> of RAM.
>>>>> I have a directory of documents in various sizes, ranging from a few
>>>>> kilobytes up to 2G.
>>>>> I am trying to import these documents with the command
>>>>> CREATE DB mydb /path/to/docs
>>>>> With the default jvm max heap size (2GB) I get the error: Out of main
>>>>> memory
>>>>> If I raise the max heap size to 4GB, I get the same error.
>>>>> If I raise it to 8GB, the system becomes unresponsive.
>>>>> How can I determine how much system memory I need to be able to carry
>>>>> out this task?
>>>>> Thanks
>>>>>
>>>>

Reply via email to