Hello Fabrice That’s brilliant, thank you very much, I will keep it in mind for future reference.
No, I did not set the DEBUG and yes it was directory content. Once I find some time, I am going to run the “offending” import again with DEBUG and send some more information in case this is indeed a bug. But, I have to say, it may be that the DB was hitting one of its natural limits, which is fine. All the best From: basex-talk-boun...@mailman.uni-konstanz.de [mailto:basex-talk-boun...@mailman.uni-konstanz.de] On Behalf Of Fabrice ETANCHAUD Sent: 14 September 2017 09:26 To: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Possible Bug in BaseX 8.2.3 when importing XML (Was RE: A few general questions about BaseX) Hi Athanasios, Did you set the DEBUG option to get detailed information ? Could you confirm you are creating a db from a directory content ? If this is the case, as suggested, you should generate a command script to force the loading order, and use this script to load the data in forced order to detect where it fails. You can easily create such a bxs file in xquery with a for file:list() loop. This should look like : <command> <set option=”ADDCACHE”/> <set option=”DEBUG”/> <create-db name=”mydb”/> <add>myphysicalpath</add> <add>myphysicalpath</add> .. <close/> </command> Best regards, Fabrice Etanchaud De : Anastasiou A. [mailto:a.anastas...@swansea.ac.uk] Envoyé : mercredi 13 septembre 2017 11:23 À : basex-talk@mailman.uni-konstanz.de<mailto:basex-talk@mailman.uni-konstanz.de> Cc : 'Alexander Holupirek'; 'Michael Seiferle'; Fabrice ETANCHAUD; 'Bridger Dyson-Smith' Objet : Possible Bug in BaseX 8.2.3 when importing XML (Was RE: [basex-talk] A few general questions about BaseX) Hello everyone Many thanks to Alexander, Bridger, Fabrice, Michael for getting back to me with very detailed responses, these have been really helpful. A few notes: 1) The name is Athanasios :D. Sorry, just couldn’t help it, it seemed incredibly formal to be addressed via the surname in our communications. Our mail server advertises the “Surname. Initial” pattern, so I can see where the confusion came from. 2) I think that there is scope for adding some sort of “logging” to all actions of the server in general because I think I may have hit a bug but I cannot provide any more illuminating comments. Here is what is happening: a. During import, I get an error that file somethingsomething140.xml has an incredibly long element that cannot be imported at line (blahblah). The whole process just dies there. b. This is a bug, because if I simply imported JUST the offending file itself, a single file database is created without any problems and I can query it and all. So, maybe, the error is caused because of the previous file OR because of the way the files are loaded. But I have absolutely no way of knowing the “load history” of the files or the exception that was caught or anything else. In fact, once you press “OK” in the error dialog box, any database files that have been created are lost. In addition to this, the XML files to import are enumerated in a random order. So, I had to run the import again and stay there looking at each one of the files loading, to witness that the system “breaks” after 254 files (which is suspiciously close to 256). None of the files around the vicinity of the offending file caused any problems, so this may be a more difficult to catch bug (but it is thrown with both the internal and external parsers). Following this, I created smaller databases with 250 XML files and then got “predictable” errors on running out of memory and not creating indexes which I can solve more easily. 3) It’s good to know that I don’t need the original files because that’s a lot of space I can get rid of. Thank you. 4) Seems like the ADDCACHE would have saved me some trouble here, many thanks for that, but of course, if you don’t know the file enumeration order, you are still stuck in not knowing which files have already been imported. 5) Michael, logging won’t help with the internal import procedure, except of course if you were implying writing a quick script to do the import “manually”? 6) Michael, the fork-join and “client connect” are really interesting and worth a try before I start connecting things together via Hadoop. Are these modules already available to BaseX? Do I simply import their namespace or is it not even needed? Many thanks again. All the best From: Bridger Dyson-Smith [mailto:bdysonsm...@gmail.com] Sent: 12 September 2017 16:53 To: Anastasiou A. Cc: basex-talk@mailman.uni-konstanz.de<mailto:basex-talk@mailman.uni-konstanz.de> Subject: Re: [basex-talk] A few general questions about BaseX Hi Anastasiou, Hopefully some of these answers are somewhat helpful. On Tue, Sep 12, 2017 at 4:54 AM, Anastasiou A. <a.anastas...@swansea.ac.uk<mailto:a.anastas...@swansea.ac.uk>> wrote: Hello everyone I am trying to load BaseX with a large number of XML files (~500), each one a few hundreds of MBs big. BaseX fails with a message along the lines “This is too big for one database”. Can I please ask: 1) Are there any logs, beyond the DB logs? If yes, where can I find them? a. The reason I am asking is because once basexgui gives the message, there is no indication about the error. Ideally, I would like to know if this is a limitation on memory amount or number of items (?). I'm not sure how to enable more verbose logging with the GUI -- hopefully one of the devs or power users can weigh in on this. 2) The parser options include reading XML files from archives, which is very convenient, but once the file has been parsed, does BaseX require the “originals” for queries / returning results? AFAIK, no it does not. BaseX will query and return results from the internal database(s). 3) Is it possible to do federation with BaseX? In other words, let’s say I split a database in two large parts (as per #1), is it possible to launch two baseX servers and then have them talk to each other so that ultimately I just query one of them and get back unified results? AFAIK, the preferred method is to split your files across many databases, then query multiple databases from a single expression[1]. Others will be able to speak to this better, but I don't think there's a straightforward way to run multiple BaseX servers in a single JVM. All the best Best, Bridger [1] http://docs.basex.org/wiki/Databases