Hello Michael, I try to use your Turtleloader to import Yago(https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/downloads/) into neo4j. But I met some weird problems when importing. I can import YagoFacts.ttl and YagoTypes.ttl well separably. But when I tried to import both of them I got this error. I'm not sure what's the reason. There is some limit for TurtleLoader or BatchImporter?
Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58) Caused by: java.lang.RuntimeException: Panic called, so exiting at org.neo4j.unsafe.impl.batchimport.staging.AbstractStep.assertHealthy(AbstractStep.java:200) at org.neo4j.unsafe.impl.batchimport.staging.ProducerStep.process(ProducerStep.java:78) at org.neo4j.unsafe.impl.batchimport.staging.ProducerStep$1.run(ProducerStep.java:54) Caused by: java.lang.IllegalArgumentException at sun.misc.Unsafe.allocateMemory(Native Method) at org.neo4j.unsafe.impl.internal.dragons.UnsafeUtil.malloc(UnsafeUtil.java:324) at org.neo4j.unsafe.impl.batchimport.cache.OffHeapNumberArray.<init>(OffHeapNumberArray.java:41) at org.neo4j.unsafe.impl.batchimport.cache.OffHeapLongArray.<init>(OffHeapLongArray.java:34) at org.neo4j.unsafe.impl.batchimport.cache.NumberArrayFactory$2.newLongArray(NumberArrayFactory.java:122) at org.neo4j.unsafe.impl.batchimport.cache.NumberArrayFactory$Auto.newLongArray(NumberArrayFactory.java:154) at org.neo4j.unsafe.impl.batchimport.RelationshipCountsProcessor.<init>(RelationshipCountsProcessor.java:60) at org.neo4j.unsafe.impl.batchimport.ProcessRelationshipCountsDataStep.processor(ProcessRelationshipCountsDataStep.java:73) at org.neo4j.unsafe.impl.batchimport.ProcessRelationshipCountsDataStep.process(ProcessRelationshipCountsDataStep.java:60) at org.neo4j.unsafe.impl.batchimport.ProcessRelationshipCountsDataStep.process(ProcessRelationshipCountsDataStep.java:36) at org.neo4j.unsafe.impl.batchimport.staging.ProcessorStep$4.run(ProcessorStep.java:120) at org.neo4j.unsafe.impl.batchimport.staging.ProcessorStep$4.run(ProcessorStep.java:102) at org.neo4j.unsafe.impl.batchimport.executor.DynamicTaskExecutor$Processor.run(DynamicTaskExecutor.java:237) Bests~ Qi Song On Friday, June 7, 2013 at 1:35:26 AM UTC-7, Michael B. wrote: > > I checked that out in my batch importer (have a look at it on github). > MapDB performs pretty good, but in the end, the index look-ups aren't > the big bottleneck. If you need to make normal index operation at any > point (to make sure you're not importing duplicates) or iterate over > relationships of nodes to create unique relationships, everything's > becoming way slower. > > As far as Batch imports go, I think an in-memory MapDB ist the best > option. You might want to include some kind of function to create an > in-memory index on specific Labels/keys to allow for fast access to > whatever's desired for batch loads. > > Here's what I did for Batch loads: > > https://github.com/mybyte/tools/blob/master/Turtle%20loader/src/de/miba/neo4j/loader/turtle/Neo4jMapDBBatchHandler.java > > The import went fine, pretty fast I'd say. The bigger problem is > overall performance on all the node operations... > > On Freitag, 7. Juni 2013 10:26:47, Michael Hunger wrote: > > Actually I want to update the CSV batch inserter to support index > > lookups and use real "csv" that means I'll put MapDB in there, we'll > > see how it goes. > > > > You can also see if just a standard HashMap is good enough for you or > > a Trove-primitive Map. Otherwise there is still that trick with the > > array of unique values which you can sort and then use the array index > > as node-id. inserter.createNode(index, props) and then the id-lookup > > for rels is just Arrays.binarySearch(array, value) > > > > I also have to update the batch-importer to 2.0 but that's a bigger > > piece of work. As lots of the internals changed in between. > > > > Michael > > > > > > On Fri, Jun 7, 2013 at 10:10 AM, Michael B. <[email protected] > <javascript:> > > <mailto:[email protected] <javascript:>>> wrote: > > > > Michael Hunger has actually written a blog entry on this. Check > > his blog out: http://jexp.de/blog/ > > > > Standard Lucene performs poorly in many cases. The only thing it's > > good at is full text search with N-Gram. If you don't need that, > > any key-value storm performs better, e.g. MapDB or Voldemort. > > > > > > On Freitag, 7. Juni 2013 07:41:34, Jennifer Smith wrote: > > > > Hi Michael, > > > > Yes I was considering using MapDB. We actually do use the > standard > > lucene indexes during our existing 1.9x batch insertion. We > > also do a > > pre-existing data check when inserting nodes and entities that > > uses > > the index. So far it's been fast enough - by that I mean > > taking 2/3 > > hours for about 50 million nodes, 90 million relationships! > > But when > > we need more performance, I am happy to explore mapdb as an > > option at > > import time. I would also probably be interested in using this > > as a > > permanent index too, rather than just at import time. > > > > Thanks > > > > Jen > > > > On Tuesday, 4 June 2013 14:31:59 UTC+1, Michael B. wrote: > > > > Check out my blog entry on batch imports: > > > > http://michaelbloggs.blogspot.__com/2013/05/importing-ttl-__turtle-ontologies-in-neo4j.__html > > > > > < > http://michaelbloggs.blogspot.com/2013/05/importing-ttl-turtle-ontologies-in-neo4j.html> > > > > > > <http://michaelbloggs.__ > blogspot.com/2013/05/__importing-ttl-turtle-__ontologies-in-neo4j.html > > < > http://michaelbloggs.blogspot.com/2013/05/importing-ttl-turtle-ontologies-in-neo4j.html>> > > > > > > Labels are a bit complicated. You shouldn't /commit /to > > indices > > > > during batch imports (but you can add stuff to them) - > they'll > > make everything incredibly slow. Michael Hunger suggested > > to use > > MapDB as a temporary index. That's what I'd do in your > place. > > Either do it like I did (for small data sets a HashMap is > more > > than enough) and use a java.util.Map implementation + index > as > > fallback for the nodes that are in the DB, but haven't been > > imported by your application or use a MapDB instead. > > > > Regards, > > Michael > > > > On Tuesday, 4 June 2013 11:47:25 UTC+2, Jennifer Smith > wrote: > > > > Hi there, > > > > I have been looking at the docs for 2.0 particularly > > around > > support for labels during batch import. > > > > I see there is support for adding labels to nodes > > during batch > > import, directly querying labels for nodes and so on. > > However, > > unless I am missing something I don't see that there is > > support for locating a node by label and ID. I have > > found I > > have needed to do this when I import a large dataset > > where the > > relationships come separately from the nodes (say a > > dump from > > a relational database) and I need to use an external ID > to > > find the nodes for the relationship. > > > > I wondered what the intended approach for looking up > > a node > > by label and ID is during batch import. I can see the > > following choices: > > > > - Use the standard EmbeddedGraphDatabase (making sure > > to have > > shut down the batch inserter of course) to look up the > > nodes > > for a bunch of relationship inserts before going into > > insert mode. > > - Use the BatchInserterIndexProvider to somehow hack > > into the > > underlying index that I believe is created for labels > > - Be patient and wait for support to appear in the > > batch API > > for querying nodes by label and ID :) > > > > Thanks > > > > Jen > > > > -- > > You received this message because you are subscribed to a > > topic in the > > Google Groups "Neo4j" group. > > To unsubscribe from this topic, visit > > > https://groups.google.com/d/__topic/neo4j/eq_2fD2BlQU/__unsubscribe?hl=en > > < > https://groups.google.com/d/topic/neo4j/eq_2fD2BlQU/unsubscribe?hl=en>. > > To unsubscribe from this group and all its topics, send an > > email to > > > > neo4j+unsubscribe@__googlegroups.com > > <mailto:neo4j%[email protected] <javascript:>>. > > For more options, visit > > https://groups.google.com/__groups/opt_out > > <https://groups.google.com/groups/opt_out>. > > > > > > > > > > -- > > You received this message because you are subscribed to the Google > > Groups "Neo4j" group. > > To unsubscribe from this group and stop receiving emails from it, > > send an email to neo4j+unsubscribe@__googlegroups.com > > <mailto:neo4j%[email protected] <javascript:>>. > > For more options, visit https://groups.google.com/__groups/opt_out > > <https://groups.google.com/groups/opt_out>. > > > > > > > > -- > > You received this message because you are subscribed to a topic in the > > Google Groups "Neo4j" group. > > To unsubscribe from this topic, visit > > https://groups.google.com/d/topic/neo4j/eq_2fD2BlQU/unsubscribe?hl=en. > > To unsubscribe from this group and all its topics, send an email to > > [email protected] <javascript:>. > > For more options, visit https://groups.google.com/groups/opt_out. > > > > > > > -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
