Hi Michael, Thanks for your reply :) I noticed that the code is old and use some old APIs. However, the label is a bottleneck for loading RDF files. In my work, the label is very important. I'll try to find some way to handle labels more effective.
Bests~ Qi Song On Thursday, October 15, 2015 at 2:07:08 PM UTC-7, Michael B. wrote: > > Hi! > > My best guess would be that the algorithm neo4j uses is just can't cope > with the vast amount of labels this sort of use case would produce. Anyhow, > the code is very, very old... > The better approach to this would be to actually model RDF-like > relationships with nodes and introduce only a few labels for class, > individual, maybe a couple data types. > > Von meinem iPad gesendet > > Am 15.10.2015 um 11:00 schrieb Qi Song <[email protected] <javascript:>>: > > Hello Michael, > I try to use your Turtleloader to import Yago( > https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/downloads/) > > into neo4j. But I met some weird problems when importing. I can import > YagoFacts.ttl and YagoTypes.ttl well separably. But when I tried to import > both of them I got this error. I'm not sure what's the reason. There is > some limit for TurtleLoader or BatchImporter? > > Exception in thread "main" java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58) > Caused by: java.lang.RuntimeException: Panic called, so exiting > at > org.neo4j.unsafe.impl.batchimport.staging.AbstractStep.assertHealthy(AbstractStep.java:200) > at > org.neo4j.unsafe.impl.batchimport.staging.ProducerStep.process(ProducerStep.java:78) > at > org.neo4j.unsafe.impl.batchimport.staging.ProducerStep$1.run(ProducerStep.java:54) > Caused by: java.lang.IllegalArgumentException > at sun.misc.Unsafe.allocateMemory(Native Method) > at > org.neo4j.unsafe.impl.internal.dragons.UnsafeUtil.malloc(UnsafeUtil.java:324) > at > org.neo4j.unsafe.impl.batchimport.cache.OffHeapNumberArray.<init>(OffHeapNumberArray.java:41) > at > org.neo4j.unsafe.impl.batchimport.cache.OffHeapLongArray.<init>(OffHeapLongArray.java:34) > at > org.neo4j.unsafe.impl.batchimport.cache.NumberArrayFactory$2.newLongArray(NumberArrayFactory.java:122) > at > org.neo4j.unsafe.impl.batchimport.cache.NumberArrayFactory$Auto.newLongArray(NumberArrayFactory.java:154) > at > org.neo4j.unsafe.impl.batchimport.RelationshipCountsProcessor.<init>(RelationshipCountsProcessor.java:60) > at > org.neo4j.unsafe.impl.batchimport.ProcessRelationshipCountsDataStep.processor(ProcessRelationshipCountsDataStep.java:73) > at > org.neo4j.unsafe.impl.batchimport.ProcessRelationshipCountsDataStep.process(ProcessRelationshipCountsDataStep.java:60) > at > org.neo4j.unsafe.impl.batchimport.ProcessRelationshipCountsDataStep.process(ProcessRelationshipCountsDataStep.java:36) > at > org.neo4j.unsafe.impl.batchimport.staging.ProcessorStep$4.run(ProcessorStep.java:120) > at > org.neo4j.unsafe.impl.batchimport.staging.ProcessorStep$4.run(ProcessorStep.java:102) > at > org.neo4j.unsafe.impl.batchimport.executor.DynamicTaskExecutor$Processor.run(DynamicTaskExecutor.java:237) > > Bests~ > Qi Song > > On Friday, June 7, 2013 at 1:35:26 AM UTC-7, Michael B. wrote: >> >> I checked that out in my batch importer (have a look at it on github). >> MapDB performs pretty good, but in the end, the index look-ups aren't >> the big bottleneck. If you need to make normal index operation at any >> point (to make sure you're not importing duplicates) or iterate over >> relationships of nodes to create unique relationships, everything's >> becoming way slower. >> >> As far as Batch imports go, I think an in-memory MapDB ist the best >> option. You might want to include some kind of function to create an >> in-memory index on specific Labels/keys to allow for fast access to >> whatever's desired for batch loads. >> >> Here's what I did for Batch loads: >> >> https://github.com/mybyte/tools/blob/master/Turtle%20loader/src/de/miba/neo4j/loader/turtle/Neo4jMapDBBatchHandler.java >> >> The import went fine, pretty fast I'd say. The bigger problem is >> overall performance on all the node operations... >> >> On Freitag, 7. Juni 2013 10:26:47, Michael Hunger wrote: >> > Actually I want to update the CSV batch inserter to support index >> > lookups and use real "csv" that means I'll put MapDB in there, we'll >> > see how it goes. >> > >> > You can also see if just a standard HashMap is good enough for you or >> > a Trove-primitive Map. Otherwise there is still that trick with the >> > array of unique values which you can sort and then use the array index >> > as node-id. inserter.createNode(index, props) and then the id-lookup >> > for rels is just Arrays.binarySearch(array, value) >> > >> > I also have to update the batch-importer to 2.0 but that's a bigger >> > piece of work. As lots of the internals changed in between. >> > >> > Michael >> > >> > >> > On Fri, Jun 7, 2013 at 10:10 AM, Michael B. <[email protected] >> > <mailto:[email protected]>> wrote: >> > >> > Michael Hunger has actually written a blog entry on this. Check >> > his blog out: http://jexp.de/blog/ >> > >> > Standard Lucene performs poorly in many cases. The only thing it's >> > good at is full text search with N-Gram. If you don't need that, >> > any key-value storm performs better, e.g. MapDB or Voldemort. >> > >> > >> > On Freitag, 7. Juni 2013 07:41:34, Jennifer Smith wrote: >> > >> > Hi Michael, >> > >> > Yes I was considering using MapDB. We actually do use the >> standard >> > lucene indexes during our existing 1.9x batch insertion. We >> > also do a >> > pre-existing data check when inserting nodes and entities that >> > uses >> > the index. So far it's been fast enough - by that I mean >> > taking 2/3 >> > hours for about 50 million nodes, 90 million relationships! >> > But when >> > we need more performance, I am happy to explore mapdb as an >> > option at >> > import time. I would also probably be interested in using this >> > as a >> > permanent index too, rather than just at import time. >> > >> > Thanks >> > >> > Jen >> > >> > On Tuesday, 4 June 2013 14:31:59 UTC+1, Michael B. wrote: >> > >> > Check out my blog entry on batch imports: >> > >> > http://michaelbloggs.blogspot.__com/2013/05/importing-ttl-__turtle-ontologies-in-neo4j.__html >> > >> >> > < >> http://michaelbloggs.blogspot.com/2013/05/importing-ttl-turtle-ontologies-in-neo4j.html> >> >> >> > >> > <http://michaelbloggs.__ >> blogspot.com/2013/05/__importing-ttl-turtle-__ontologies-in-neo4j.html >> > < >> http://michaelbloggs.blogspot.com/2013/05/importing-ttl-turtle-ontologies-in-neo4j.html>> >> >> >> > >> > Labels are a bit complicated. You shouldn't /commit /to >> > indices >> > >> > during batch imports (but you can add stuff to them) - >> they'll >> > make everything incredibly slow. Michael Hunger suggested >> > to use >> > MapDB as a temporary index. That's what I'd do in your >> place. >> > Either do it like I did (for small data sets a HashMap is >> more >> > than enough) and use a java.util.Map implementation + index >> as >> > fallback for the nodes that are in the DB, but haven't been >> > imported by your application or use a MapDB instead. >> > >> > Regards, >> > Michael >> > >> > On Tuesday, 4 June 2013 11:47:25 UTC+2, Jennifer Smith >> wrote: >> > >> > Hi there, >> > >> > I have been looking at the docs for 2.0 particularly >> > around >> > support for labels during batch import. >> > >> > I see there is support for adding labels to nodes >> > during batch >> > import, directly querying labels for nodes and so on. >> > However, >> > unless I am missing something I don't see that there is >> > support for locating a node by label and ID. I have >> > found I >> > have needed to do this when I import a large dataset >> > where the >> > relationships come separately from the nodes (say a >> > dump from >> > a relational database) and I need to use an external ID >> to >> > find the nodes for the relationship. >> > >> > I wondered what the intended approach for looking up >> > a node >> > by label and ID is during batch import. I can see the >> > following choices: >> > >> > - Use the standard EmbeddedGraphDatabase (making sure >> > to have >> > shut down the batch inserter of course) to look up the >> > nodes >> > for a bunch of relationship inserts before going into >> > insert mode. >> > - Use the BatchInserterIndexProvider to somehow hack >> > into the >> > underlying index that I believe is created for labels >> > - Be patient and wait for support to appear in the >> > batch API >> > for querying nodes by label and ID :) >> > >> > Thanks >> > >> > Jen >> > >> > -- >> > You received this message because you are subscribed to a >> > topic in the >> > Google Groups "Neo4j" group. >> > To unsubscribe from this topic, visit >> > >> https://groups.google.com/d/__topic/neo4j/eq_2fD2BlQU/__unsubscribe?hl=en >> > < >> https://groups.google.com/d/topic/neo4j/eq_2fD2BlQU/unsubscribe?hl=en>. >> > To unsubscribe from this group and all its topics, send an >> > email to >> > >> > neo4j+unsubscribe@__googlegroups.com >> > <mailto:neo4j%[email protected]>. >> > For more options, visit >> > https://groups.google.com/__groups/opt_out >> > <https://groups.google.com/groups/opt_out>. >> > >> > >> > >> > >> > -- >> > You received this message because you are subscribed to the Google >> > Groups "Neo4j" group. >> > To unsubscribe from this group and stop receiving emails from it, >> > send an email to neo4j+unsubscribe@__googlegroups.com >> > <mailto:neo4j%[email protected]>. >> > For more options, visit https://groups.google.com/__groups/opt_out >> > <https://groups.google.com/groups/opt_out>. >> > >> > >> > >> > -- >> > You received this message because you are subscribed to a topic in the >> > Google Groups "Neo4j" group. >> > To unsubscribe from this topic, visit >> > https://groups.google.com/d/topic/neo4j/eq_2fD2BlQU/unsubscribe?hl=en. >> > To unsubscribe from this group and all its topics, send an email to >> > [email protected]. >> > For more options, visit https://groups.google.com/groups/opt_out. >> > >> > >> >> >> -- > You received this message because you are subscribed to a topic in the > Google Groups "Neo4j" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/neo4j/eq_2fD2BlQU/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > [email protected] <javascript:>. > For more options, visit https://groups.google.com/d/optout. > > -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
