This looks scary like denomalization wikicat_Songwriters_from_Louisiana
Shouldn't that be 3 nodes linked to it rather than a type node Von meinem iPhone gesendet > Am 17.10.2015 um 11:04 schrieb Michael B. <[email protected]>: > > Yago has roughly 350,000 different classes, 10 million entities and 120 > million facts (which would be either relationships or properties). > > As mentioned previously, I'd rather go with few labels are model entity types > as their own nodes (which is the case in RDF). You could query for it with > something like this: > match > (x:Individual)-[t:is_a]->(c:Class{type:wikicat_Songwriters_from_Louisiana}) > return x > >> On 17 October 2015 at 10:13, Michael Hunger >> <[email protected]> wrote: >> How many different types? >> >> Von meinem iPhone gesendet >> >>> Am 17.10.2015 um 06:38 schrieb Qi Song <[email protected]>: >>> >>> Each instance in Yago have a type, and there are millions instances. >>> >>>> On Fri, Oct 16, 2015 at 3:26 PM, Michael Hunger >>>> <[email protected]> wrote: >>>> Labels are roles or tags on nodes. >>>> >>>> Which can be used to represent types as well. >>>> >>>> That you can attach metadata like indexes is just a benefit. >>>> >>>> The is-a relationships might be fine on a theoretical model, but will not >>>> perform that well if you have many millions or billions of them and query >>>> across them. >>>> >>>> How many types are there in yago? >>>> >>>> Michael >>>> >>>>> Am 16.10.2015 um 23:40 schrieb Michael Bach <[email protected]>: >>>>> >>>>> I did a couple of experiments today. For all it's worth: the labels are a >>>>> means to index different document sets, since property indexes are built >>>>> on node label basis. I wouldn't try and introduce a label for each class >>>>> in yago. As mentioned before, I'd rather try and model is-a relationships >>>>> with nodes rather than labels. >>>>> >>>>> Is there a particular reason why you're trying your luck with neo4j >>>>> instead of virtuoso or jena? >>>>> >>>>> Von meinem iPad gesendet >>>>> >>>>>> Am 15.10.2015 um 23:12 schrieb Qi Song <[email protected]>: >>>>>> >>>>>> Hi Michael, >>>>>> Thanks for your reply :) I noticed that the code is old and use some old >>>>>> APIs. However, the label is a bottleneck for loading RDF files. In my >>>>>> work, the label is very important. I'll try to find some way to handle >>>>>> labels more effective. >>>>>> >>>>>> Bests~ >>>>>> Qi Song >>>>>> >>>>>>> On Thursday, October 15, 2015 at 2:07:08 PM UTC-7, Michael B. wrote: >>>>>>> Hi! >>>>>>> >>>>>>> My best guess would be that the algorithm neo4j uses is just can't cope >>>>>>> with the vast amount of labels this sort of use case would produce. >>>>>>> Anyhow, the code is very, very old... >>>>>>> The better approach to this would be to actually model RDF-like >>>>>>> relationships with nodes and introduce only a few labels for class, >>>>>>> individual, maybe a couple data types. >>>>>>> >>>>>>> Von meinem iPad gesendet >>>>>>> >>>>>>>> Am 15.10.2015 um 11:00 schrieb Qi Song <[email protected]>: >>>>>>>> >>>>>>>> Hello Michael, >>>>>>>> I try to use your Turtleloader to import >>>>>>>> Yago(https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/downloads/) >>>>>>>> into neo4j. But I met some weird problems when importing. I can >>>>>>>> import YagoFacts.ttl and YagoTypes.ttl well separably. But when I >>>>>>>> tried to import both of them I got this error. I'm not sure what's the >>>>>>>> reason. There is some limit for TurtleLoader or BatchImporter? >>>>>>>> >>>>>>>> Exception in thread "main" java.lang.reflect.InvocationTargetException >>>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>>>>> at >>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >>>>>>>> at >>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>>>>>>> at java.lang.reflect.Method.invoke(Method.java:497) >>>>>>>> at >>>>>>>> org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58) >>>>>>>> Caused by: java.lang.RuntimeException: Panic called, so exiting >>>>>>>> at >>>>>>>> org.neo4j.unsafe.impl.batchimport.staging.AbstractStep.assertHealthy(AbstractStep.java:200) >>>>>>>> at >>>>>>>> org.neo4j.unsafe.impl.batchimport.staging.ProducerStep.process(ProducerStep.java:78) >>>>>>>> at >>>>>>>> org.neo4j.unsafe.impl.batchimport.staging.ProducerStep$1.run(ProducerStep.java:54) >>>>>>>> Caused by: java.lang.IllegalArgumentException >>>>>>>> at sun.misc.Unsafe.allocateMemory(Native Method) >>>>>>>> at >>>>>>>> org.neo4j.unsafe.impl.internal.dragons.UnsafeUtil.malloc(UnsafeUtil.java:324) >>>>>>>> at >>>>>>>> org.neo4j.unsafe.impl.batchimport.cache.OffHeapNumberArray.<init>(OffHeapNumberArray.java:41) >>>>>>>> at >>>>>>>> org.neo4j.unsafe.impl.batchimport.cache.OffHeapLongArray.<init>(OffHeapLongArray.java:34) >>>>>>>> at >>>>>>>> org.neo4j.unsafe.impl.batchimport.cache.NumberArrayFactory$2.newLongArray(NumberArrayFactory.java:122) >>>>>>>> at >>>>>>>> org.neo4j.unsafe.impl.batchimport.cache.NumberArrayFactory$Auto.newLongArray(NumberArrayFactory.java:154) >>>>>>>> at >>>>>>>> org.neo4j.unsafe.impl.batchimport.RelationshipCountsProcessor.<init>(RelationshipCountsProcessor.java:60) >>>>>>>> at >>>>>>>> org.neo4j.unsafe.impl.batchimport.ProcessRelationshipCountsDataStep.processor(ProcessRelationshipCountsDataStep.java:73) >>>>>>>> at >>>>>>>> org.neo4j.unsafe.impl.batchimport.ProcessRelationshipCountsDataStep.process(ProcessRelationshipCountsDataStep.java:60) >>>>>>>> at >>>>>>>> org.neo4j.unsafe.impl.batchimport.ProcessRelationshipCountsDataStep.process(ProcessRelationshipCountsDataStep.java:36) >>>>>>>> at >>>>>>>> org.neo4j.unsafe.impl.batchimport.staging.ProcessorStep$4.run(ProcessorStep.java:120) >>>>>>>> at >>>>>>>> org.neo4j.unsafe.impl.batchimport.staging.ProcessorStep$4.run(ProcessorStep.java:102) >>>>>>>> at >>>>>>>> org.neo4j.unsafe.impl.batchimport.executor.DynamicTaskExecutor$Processor.run(DynamicTaskExecutor.java:237) >>>>>>>> >>>>>>>> Bests~ >>>>>>>> Qi Song >>>>>>>> >>>>>>>>> On Friday, June 7, 2013 at 1:35:26 AM UTC-7, Michael B. wrote: >>>>>>>>> I checked that out in my batch importer (have a look at it on >>>>>>>>> github). >>>>>>>>> MapDB performs pretty good, but in the end, the index look-ups aren't >>>>>>>>> the big bottleneck. If you need to make normal index operation at any >>>>>>>>> point (to make sure you're not importing duplicates) or iterate over >>>>>>>>> relationships of nodes to create unique relationships, everything's >>>>>>>>> becoming way slower. >>>>>>>>> >>>>>>>>> As far as Batch imports go, I think an in-memory MapDB ist the best >>>>>>>>> option. You might want to include some kind of function to create an >>>>>>>>> in-memory index on specific Labels/keys to allow for fast access to >>>>>>>>> whatever's desired for batch loads. >>>>>>>>> >>>>>>>>> Here's what I did for Batch loads: >>>>>>>>> https://github.com/mybyte/tools/blob/master/Turtle%20loader/src/de/miba/neo4j/loader/turtle/Neo4jMapDBBatchHandler.java >>>>>>>>> >>>>>>>>> The import went fine, pretty fast I'd say. The bigger problem is >>>>>>>>> overall performance on all the node operations... >>>>>>>>> >>>>>>>>> On Freitag, 7. Juni 2013 10:26:47, Michael Hunger wrote: >>>>>>>>> > Actually I want to update the CSV batch inserter to support index >>>>>>>>> > lookups and use real "csv" that means I'll put MapDB in there, >>>>>>>>> > we'll >>>>>>>>> > see how it goes. >>>>>>>>> > >>>>>>>>> > You can also see if just a standard HashMap is good enough for you >>>>>>>>> > or >>>>>>>>> > a Trove-primitive Map. Otherwise there is still that trick with the >>>>>>>>> > array of unique values which you can sort and then use the array >>>>>>>>> > index >>>>>>>>> > as node-id. inserter.createNode(index, props) and then the >>>>>>>>> > id-lookup >>>>>>>>> > for rels is just Arrays.binarySearch(array, value) >>>>>>>>> > >>>>>>>>> > I also have to update the batch-importer to 2.0 but that's a bigger >>>>>>>>> > piece of work. As lots of the internals changed in between. >>>>>>>>> > >>>>>>>>> > Michael >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > On Fri, Jun 7, 2013 at 10:10 AM, Michael B. <[email protected] >>>>>>>>> > <mailto:[email protected]>> wrote: >>>>>>>>> > >>>>>>>>> > Michael Hunger has actually written a blog entry on this. Check >>>>>>>>> > his blog out: http://jexp.de/blog/ >>>>>>>>> > >>>>>>>>> > Standard Lucene performs poorly in many cases. The only thing >>>>>>>>> > it's >>>>>>>>> > good at is full text search with N-Gram. If you don't need >>>>>>>>> > that, >>>>>>>>> > any key-value storm performs better, e.g. MapDB or Voldemort. >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > On Freitag, 7. Juni 2013 07:41:34, Jennifer Smith wrote: >>>>>>>>> > >>>>>>>>> > Hi Michael, >>>>>>>>> > >>>>>>>>> > Yes I was considering using MapDB. We actually do use the >>>>>>>>> > standard >>>>>>>>> > lucene indexes during our existing 1.9x batch insertion. We >>>>>>>>> > also do a >>>>>>>>> > pre-existing data check when inserting nodes and entities >>>>>>>>> > that >>>>>>>>> > uses >>>>>>>>> > the index. So far it's been fast enough - by that I mean >>>>>>>>> > taking 2/3 >>>>>>>>> > hours for about 50 million nodes, 90 million relationships! >>>>>>>>> > But when >>>>>>>>> > we need more performance, I am happy to explore mapdb as an >>>>>>>>> > option at >>>>>>>>> > import time. I would also probably be interested in using >>>>>>>>> > this >>>>>>>>> > as a >>>>>>>>> > permanent index too, rather than just at import time. >>>>>>>>> > >>>>>>>>> > Thanks >>>>>>>>> > >>>>>>>>> > Jen >>>>>>>>> > >>>>>>>>> > On Tuesday, 4 June 2013 14:31:59 UTC+1, Michael B. wrote: >>>>>>>>> > >>>>>>>>> > Check out my blog entry on batch imports: >>>>>>>>> > >>>>>>>>> > http://michaelbloggs.blogspot.__com/2013/05/importing-ttl-__turtle-ontologies-in-neo4j.__html >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > <http://michaelbloggs.blogspot.com/2013/05/importing-ttl-turtle-ontologies-in-neo4j.html> >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > <http://michaelbloggs.__blogspot.com/2013/05/__importing-ttl-turtle-__ontologies-in-neo4j.html >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > <http://michaelbloggs.blogspot.com/2013/05/importing-ttl-turtle-ontologies-in-neo4j.html>> >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > Labels are a bit complicated. You shouldn't /commit /to >>>>>>>>> > indices >>>>>>>>> > >>>>>>>>> > during batch imports (but you can add stuff to them) - >>>>>>>>> > they'll >>>>>>>>> > make everything incredibly slow. Michael Hunger >>>>>>>>> > suggested >>>>>>>>> > to use >>>>>>>>> > MapDB as a temporary index. That's what I'd do in your >>>>>>>>> > place. >>>>>>>>> > Either do it like I did (for small data sets a HashMap >>>>>>>>> > is more >>>>>>>>> > than enough) and use a java.util.Map implementation + >>>>>>>>> > index as >>>>>>>>> > fallback for the nodes that are in the DB, but haven't >>>>>>>>> > been >>>>>>>>> > imported by your application or use a MapDB instead. >>>>>>>>> > >>>>>>>>> > Regards, >>>>>>>>> > Michael >>>>>>>>> > >>>>>>>>> > On Tuesday, 4 June 2013 11:47:25 UTC+2, Jennifer Smith >>>>>>>>> > wrote: >>>>>>>>> > >>>>>>>>> > Hi there, >>>>>>>>> > >>>>>>>>> > I have been looking at the docs for 2.0 >>>>>>>>> > particularly >>>>>>>>> > around >>>>>>>>> > support for labels during batch import. >>>>>>>>> > >>>>>>>>> > I see there is support for adding labels to nodes >>>>>>>>> > during batch >>>>>>>>> > import, directly querying labels for nodes and so >>>>>>>>> > on. >>>>>>>>> > However, >>>>>>>>> > unless I am missing something I don't see that >>>>>>>>> > there is >>>>>>>>> > support for locating a node by label and ID. I have >>>>>>>>> > found I >>>>>>>>> > have needed to do this when I import a large >>>>>>>>> > dataset >>>>>>>>> > where the >>>>>>>>> > relationships come separately from the nodes (say a >>>>>>>>> > dump from >>>>>>>>> > a relational database) and I need to use an >>>>>>>>> > external ID to >>>>>>>>> > find the nodes for the relationship. >>>>>>>>> > >>>>>>>>> > I wondered what the intended approach for looking >>>>>>>>> > up >>>>>>>>> > a node >>>>>>>>> > by label and ID is during batch import. I can see >>>>>>>>> > the >>>>>>>>> > following choices: >>>>>>>>> > >>>>>>>>> > - Use the standard EmbeddedGraphDatabase (making >>>>>>>>> > sure >>>>>>>>> > to have >>>>>>>>> > shut down the batch inserter of course) to look up >>>>>>>>> > the >>>>>>>>> > nodes >>>>>>>>> > for a bunch of relationship inserts before going >>>>>>>>> > into >>>>>>>>> > insert mode. >>>>>>>>> > - Use the BatchInserterIndexProvider to somehow >>>>>>>>> > hack >>>>>>>>> > into the >>>>>>>>> > underlying index that I believe is created for >>>>>>>>> > labels >>>>>>>>> > - Be patient and wait for support to appear in the >>>>>>>>> > batch API >>>>>>>>> > for querying nodes by label and ID :) >>>>>>>>> > >>>>>>>>> > Thanks >>>>>>>>> > >>>>>>>>> > Jen >>>>>>>>> > >>>>>>>>> > -- >>>>>>>>> > You received this message because you are subscribed to a >>>>>>>>> > topic in the >>>>>>>>> > Google Groups "Neo4j" group. >>>>>>>>> > To unsubscribe from this topic, visit >>>>>>>>> > >>>>>>>>> > https://groups.google.com/d/__topic/neo4j/eq_2fD2BlQU/__unsubscribe?hl=en >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > <https://groups.google.com/d/topic/neo4j/eq_2fD2BlQU/unsubscribe?hl=en>. >>>>>>>>> > >>>>>>>>> > To unsubscribe from this group and all its topics, send an >>>>>>>>> > email to >>>>>>>>> > >>>>>>>>> > neo4j+unsubscribe@__googlegroups.com >>>>>>>>> > <mailto:neo4j%[email protected]>. >>>>>>>>> > For more options, visit >>>>>>>>> > https://groups.google.com/__groups/opt_out >>>>>>>>> > <https://groups.google.com/groups/opt_out>. >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > -- >>>>>>>>> > You received this message because you are subscribed to the >>>>>>>>> > Google >>>>>>>>> > Groups "Neo4j" group. >>>>>>>>> > To unsubscribe from this group and stop receiving emails from >>>>>>>>> > it, >>>>>>>>> > send an email to neo4j+unsubscribe@__googlegroups.com >>>>>>>>> > <mailto:neo4j%[email protected]>. >>>>>>>>> > For more options, visit >>>>>>>>> > https://groups.google.com/__groups/opt_out >>>>>>>>> > <https://groups.google.com/groups/opt_out>. >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > -- >>>>>>>>> > You received this message because you are subscribed to a topic in >>>>>>>>> > the >>>>>>>>> > Google Groups "Neo4j" group. >>>>>>>>> > To unsubscribe from this topic, visit >>>>>>>>> > https://groups.google.com/d/topic/neo4j/eq_2fD2BlQU/unsubscribe?hl=en. >>>>>>>>> > >>>>>>>>> > To unsubscribe from this group and all its topics, send an email to >>>>>>>>> > [email protected]. >>>>>>>>> > For more options, visit https://groups.google.com/groups/opt_out. >>>>>>>>> > >>>>>>>>> > >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> You received this message because you are subscribed to a topic in the >>>>>>>> Google Groups "Neo4j" group. >>>>>>>> To unsubscribe from this topic, visit >>>>>>>> https://groups.google.com/d/topic/neo4j/eq_2fD2BlQU/unsubscribe. >>>>>>>> To unsubscribe from this group and all its topics, send an email to >>>>>>>> [email protected]. >>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to a topic in the >>>>>> Google Groups "Neo4j" group. >>>>>> To unsubscribe from this topic, visit >>>>>> https://groups.google.com/d/topic/neo4j/eq_2fD2BlQU/unsubscribe. >>>>>> To unsubscribe from this group and all its topics, send an email to >>>>>> [email protected]. >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google Groups >>>>> "Neo4j" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send an >>>>> email to [email protected]. >>>>> For more options, visit https://groups.google.com/d/optout. >>>> >>>> -- >>>> You received this message because you are subscribed to a topic in the >>>> Google Groups "Neo4j" group. >>>> To unsubscribe from this topic, visit >>>> https://groups.google.com/d/topic/neo4j/eq_2fD2BlQU/unsubscribe. >>>> To unsubscribe from this group and all its topics, send an email to >>>> [email protected]. >>>> For more options, visit https://groups.google.com/d/optout. >>> >>> >>> >>> -- >>> Qi Song >>> Machine learning and Knowledge Discovery Group >>> EECS Washington State University >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "Neo4j" group. >>> To unsubscribe from this group and stop receiving emails from it, send an >>> email to [email protected]. >>> For more options, visit https://groups.google.com/d/optout. >> >> -- >> You received this message because you are subscribed to a topic in the >> Google Groups "Neo4j" group. >> To unsubscribe from this topic, visit >> https://groups.google.com/d/topic/neo4j/eq_2fD2BlQU/unsubscribe. >> To unsubscribe from this group and all its topics, send an email to >> [email protected]. >> For more options, visit https://groups.google.com/d/optout. > > -- > You received this message because you are subscribed to the Google Groups > "Neo4j" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
