Hello,

I have been trying for the past few days to import a recent dump 
(enwiki-20120902-pages-articles.xml) into mysql (using the README.txt 
http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/dbpedia/file/efc0afb0faa3/abstractExtraction/README.txt).
However, after sucessfully importing 8,800,000 pages, an array out of bound 
exception occurs : 

8 800 000 pages (66,855/sec), 8 800 000 revs (66,855/sec)
[WARNING]
java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:601)
 at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:297)
 at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 2048
 at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source)
 at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
 at org.apache.xerces.impl.XMLEntityScanner.scanContent(Unknown Source)
 at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unknown 
Source)
 at 
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
 Source)
 at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown 
Source)
 at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
 at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
 at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
 at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
 at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
 at javax.xml.parsers.SAXParser.parse(SAXParser.java:392)
 at javax.xml.parsers.SAXParser.parse(SAXParser.java:195)
 at org.mediawiki.importer.XmlDumpReader.readDump(XmlDumpReader.java:88)
 at org.mediawiki.dumper.Dumper.main(Dumper.java:143)
 ... 6 more



[INFO] ------------------------------------------------------------------------
[INFO] Total time: 2199 minutes 41 seconds
[INFO] Finished at: Sun Sep 16 00:06:39 EDT 2012
[INFO] Final Memory: 54M/156M
[INFO] ------------------------------------------------------------------------


Would anyone be so kind as to offer an advice ?
Thanks !

Olivier Sollier
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to