I did a couple of experiments today. For all it's worth: the labels are a means to index different document sets, since property indexes are built on node label basis. I wouldn't try and introduce a label for each class in yago. As mentioned before, I'd rather try and model is-a relationships with nodes rather than labels.
Is there a particular reason why you're trying your luck with neo4j instead of virtuoso or jena? Von meinem iPad gesendet > Am 15.10.2015 um 23:12 schrieb Qi Song <[email protected]>: > > Hi Michael, > Thanks for your reply :) I noticed that the code is old and use some old > APIs. However, the label is a bottleneck for loading RDF files. In my work, > the label is very important. I'll try to find some way to handle labels more > effective. > > Bests~ > Qi Song > >> On Thursday, October 15, 2015 at 2:07:08 PM UTC-7, Michael B. wrote: >> Hi! >> >> My best guess would be that the algorithm neo4j uses is just can't cope with >> the vast amount of labels this sort of use case would produce. Anyhow, the >> code is very, very old... >> The better approach to this would be to actually model RDF-like >> relationships with nodes and introduce only a few labels for class, >> individual, maybe a couple data types. >> >> Von meinem iPad gesendet >> >>> Am 15.10.2015 um 11:00 schrieb Qi Song <[email protected]>: >>> >>> Hello Michael, >>> I try to use your Turtleloader to import >>> Yago(https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/downloads/) >>> into neo4j. But I met some weird problems when importing. I can import >>> YagoFacts.ttl and YagoTypes.ttl well separably. But when I tried to import >>> both of them I got this error. I'm not sure what's the reason. There is >>> some limit for TurtleLoader or BatchImporter? >>> >>> Exception in thread "main" java.lang.reflect.InvocationTargetException >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> at java.lang.reflect.Method.invoke(Method.java:497) >>> at >>> org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58) >>> Caused by: java.lang.RuntimeException: Panic called, so exiting >>> at >>> org.neo4j.unsafe.impl.batchimport.staging.AbstractStep.assertHealthy(AbstractStep.java:200) >>> at >>> org.neo4j.unsafe.impl.batchimport.staging.ProducerStep.process(ProducerStep.java:78) >>> at >>> org.neo4j.unsafe.impl.batchimport.staging.ProducerStep$1.run(ProducerStep.java:54) >>> Caused by: java.lang.IllegalArgumentException >>> at sun.misc.Unsafe.allocateMemory(Native Method) >>> at >>> org.neo4j.unsafe.impl.internal.dragons.UnsafeUtil.malloc(UnsafeUtil.java:324) >>> at >>> org.neo4j.unsafe.impl.batchimport.cache.OffHeapNumberArray.<init>(OffHeapNumberArray.java:41) >>> at >>> org.neo4j.unsafe.impl.batchimport.cache.OffHeapLongArray.<init>(OffHeapLongArray.java:34) >>> at >>> org.neo4j.unsafe.impl.batchimport.cache.NumberArrayFactory$2.newLongArray(NumberArrayFactory.java:122) >>> at >>> org.neo4j.unsafe.impl.batchimport.cache.NumberArrayFactory$Auto.newLongArray(NumberArrayFactory.java:154) >>> at >>> org.neo4j.unsafe.impl.batchimport.RelationshipCountsProcessor.<init>(RelationshipCountsProcessor.java:60) >>> at >>> org.neo4j.unsafe.impl.batchimport.ProcessRelationshipCountsDataStep.processor(ProcessRelationshipCountsDataStep.java:73) >>> at >>> org.neo4j.unsafe.impl.batchimport.ProcessRelationshipCountsDataStep.process(ProcessRelationshipCountsDataStep.java:60) >>> at >>> org.neo4j.unsafe.impl.batchimport.ProcessRelationshipCountsDataStep.process(ProcessRelationshipCountsDataStep.java:36) >>> at >>> org.neo4j.unsafe.impl.batchimport.staging.ProcessorStep$4.run(ProcessorStep.java:120) >>> at >>> org.neo4j.unsafe.impl.batchimport.staging.ProcessorStep$4.run(ProcessorStep.java:102) >>> at >>> org.neo4j.unsafe.impl.batchimport.executor.DynamicTaskExecutor$Processor.run(DynamicTaskExecutor.java:237) >>> >>> Bests~ >>> Qi Song >>> >>>> On Friday, June 7, 2013 at 1:35:26 AM UTC-7, Michael B. wrote: >>>> I checked that out in my batch importer (have a look at it on github). >>>> MapDB performs pretty good, but in the end, the index look-ups aren't >>>> the big bottleneck. If you need to make normal index operation at any >>>> point (to make sure you're not importing duplicates) or iterate over >>>> relationships of nodes to create unique relationships, everything's >>>> becoming way slower. >>>> >>>> As far as Batch imports go, I think an in-memory MapDB ist the best >>>> option. You might want to include some kind of function to create an >>>> in-memory index on specific Labels/keys to allow for fast access to >>>> whatever's desired for batch loads. >>>> >>>> Here's what I did for Batch loads: >>>> https://github.com/mybyte/tools/blob/master/Turtle%20loader/src/de/miba/neo4j/loader/turtle/Neo4jMapDBBatchHandler.java >>>> >>>> The import went fine, pretty fast I'd say. The bigger problem is >>>> overall performance on all the node operations... >>>> >>>> On Freitag, 7. Juni 2013 10:26:47, Michael Hunger wrote: >>>> > Actually I want to update the CSV batch inserter to support index >>>> > lookups and use real "csv" that means I'll put MapDB in there, we'll >>>> > see how it goes. >>>> > >>>> > You can also see if just a standard HashMap is good enough for you or >>>> > a Trove-primitive Map. Otherwise there is still that trick with the >>>> > array of unique values which you can sort and then use the array index >>>> > as node-id. inserter.createNode(index, props) and then the id-lookup >>>> > for rels is just Arrays.binarySearch(array, value) >>>> > >>>> > I also have to update the batch-importer to 2.0 but that's a bigger >>>> > piece of work. As lots of the internals changed in between. >>>> > >>>> > Michael >>>> > >>>> > >>>> > On Fri, Jun 7, 2013 at 10:10 AM, Michael B. <[email protected] >>>> > <mailto:[email protected]>> wrote: >>>> > >>>> > Michael Hunger has actually written a blog entry on this. Check >>>> > his blog out: http://jexp.de/blog/ >>>> > >>>> > Standard Lucene performs poorly in many cases. The only thing it's >>>> > good at is full text search with N-Gram. If you don't need that, >>>> > any key-value storm performs better, e.g. MapDB or Voldemort. >>>> > >>>> > >>>> > On Freitag, 7. Juni 2013 07:41:34, Jennifer Smith wrote: >>>> > >>>> > Hi Michael, >>>> > >>>> > Yes I was considering using MapDB. We actually do use the >>>> > standard >>>> > lucene indexes during our existing 1.9x batch insertion. We >>>> > also do a >>>> > pre-existing data check when inserting nodes and entities that >>>> > uses >>>> > the index. So far it's been fast enough - by that I mean >>>> > taking 2/3 >>>> > hours for about 50 million nodes, 90 million relationships! >>>> > But when >>>> > we need more performance, I am happy to explore mapdb as an >>>> > option at >>>> > import time. I would also probably be interested in using this >>>> > as a >>>> > permanent index too, rather than just at import time. >>>> > >>>> > Thanks >>>> > >>>> > Jen >>>> > >>>> > On Tuesday, 4 June 2013 14:31:59 UTC+1, Michael B. wrote: >>>> > >>>> > Check out my blog entry on batch imports: >>>> > >>>> > http://michaelbloggs.blogspot.__com/2013/05/importing-ttl-__turtle-ontologies-in-neo4j.__html >>>> > >>>> > >>>> > <http://michaelbloggs.blogspot.com/2013/05/importing-ttl-turtle-ontologies-in-neo4j.html> >>>> > >>>> > >>>> > >>>> > <http://michaelbloggs.__blogspot.com/2013/05/__importing-ttl-turtle-__ontologies-in-neo4j.html >>>> > >>>> > >>>> > <http://michaelbloggs.blogspot.com/2013/05/importing-ttl-turtle-ontologies-in-neo4j.html>> >>>> > >>>> > >>>> > Labels are a bit complicated. You shouldn't /commit /to >>>> > indices >>>> > >>>> > during batch imports (but you can add stuff to them) - >>>> > they'll >>>> > make everything incredibly slow. Michael Hunger suggested >>>> > to use >>>> > MapDB as a temporary index. That's what I'd do in your >>>> > place. >>>> > Either do it like I did (for small data sets a HashMap is >>>> > more >>>> > than enough) and use a java.util.Map implementation + index >>>> > as >>>> > fallback for the nodes that are in the DB, but haven't been >>>> > imported by your application or use a MapDB instead. >>>> > >>>> > Regards, >>>> > Michael >>>> > >>>> > On Tuesday, 4 June 2013 11:47:25 UTC+2, Jennifer Smith >>>> > wrote: >>>> > >>>> > Hi there, >>>> > >>>> > I have been looking at the docs for 2.0 particularly >>>> > around >>>> > support for labels during batch import. >>>> > >>>> > I see there is support for adding labels to nodes >>>> > during batch >>>> > import, directly querying labels for nodes and so on. >>>> > However, >>>> > unless I am missing something I don't see that there is >>>> > support for locating a node by label and ID. I have >>>> > found I >>>> > have needed to do this when I import a large dataset >>>> > where the >>>> > relationships come separately from the nodes (say a >>>> > dump from >>>> > a relational database) and I need to use an external ID >>>> > to >>>> > find the nodes for the relationship. >>>> > >>>> > I wondered what the intended approach for looking up >>>> > a node >>>> > by label and ID is during batch import. I can see the >>>> > following choices: >>>> > >>>> > - Use the standard EmbeddedGraphDatabase (making sure >>>> > to have >>>> > shut down the batch inserter of course) to look up the >>>> > nodes >>>> > for a bunch of relationship inserts before going into >>>> > insert mode. >>>> > - Use the BatchInserterIndexProvider to somehow hack >>>> > into the >>>> > underlying index that I believe is created for labels >>>> > - Be patient and wait for support to appear in the >>>> > batch API >>>> > for querying nodes by label and ID :) >>>> > >>>> > Thanks >>>> > >>>> > Jen >>>> > >>>> > -- >>>> > You received this message because you are subscribed to a >>>> > topic in the >>>> > Google Groups "Neo4j" group. >>>> > To unsubscribe from this topic, visit >>>> > >>>> > https://groups.google.com/d/__topic/neo4j/eq_2fD2BlQU/__unsubscribe?hl=en >>>> > >>>> > >>>> > <https://groups.google.com/d/topic/neo4j/eq_2fD2BlQU/unsubscribe?hl=en>. >>>> > To unsubscribe from this group and all its topics, send an >>>> > email to >>>> > >>>> > neo4j+unsubscribe@__googlegroups.com >>>> > <mailto:neo4j%[email protected]>. >>>> > For more options, visit >>>> > https://groups.google.com/__groups/opt_out >>>> > <https://groups.google.com/groups/opt_out>. >>>> > >>>> > >>>> > >>>> > >>>> > -- >>>> > You received this message because you are subscribed to the Google >>>> > Groups "Neo4j" group. >>>> > To unsubscribe from this group and stop receiving emails from it, >>>> > send an email to neo4j+unsubscribe@__googlegroups.com >>>> > <mailto:neo4j%[email protected]>. >>>> > For more options, visit https://groups.google.com/__groups/opt_out >>>> > <https://groups.google.com/groups/opt_out>. >>>> > >>>> > >>>> > >>>> > -- >>>> > You received this message because you are subscribed to a topic in the >>>> > Google Groups "Neo4j" group. >>>> > To unsubscribe from this topic, visit >>>> > https://groups.google.com/d/topic/neo4j/eq_2fD2BlQU/unsubscribe?hl=en. >>>> > To unsubscribe from this group and all its topics, send an email to >>>> > [email protected]. >>>> > For more options, visit https://groups.google.com/groups/opt_out. >>>> > >>>> > >>>> >>>> >>> >>> -- >>> You received this message because you are subscribed to a topic in the >>> Google Groups "Neo4j" group. >>> To unsubscribe from this topic, visit >>> https://groups.google.com/d/topic/neo4j/eq_2fD2BlQU/unsubscribe. >>> To unsubscribe from this group and all its topics, send an email to >>> [email protected]. >>> For more options, visit https://groups.google.com/d/optout. > > -- > You received this message because you are subscribed to a topic in the Google > Groups "Neo4j" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/neo4j/eq_2fD2BlQU/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > [email protected]. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
