How many different types? Von meinem iPhone gesendet
> Am 17.10.2015 um 06:38 schrieb Qi Song <[email protected]>: > > Each instance in Yago have a type, and there are millions instances. > >> On Fri, Oct 16, 2015 at 3:26 PM, Michael Hunger >> <[email protected]> wrote: >> Labels are roles or tags on nodes. >> >> Which can be used to represent types as well. >> >> That you can attach metadata like indexes is just a benefit. >> >> The is-a relationships might be fine on a theoretical model, but will not >> perform that well if you have many millions or billions of them and query >> across them. >> >> How many types are there in yago? >> >> Michael >> >>> Am 16.10.2015 um 23:40 schrieb Michael Bach <[email protected]>: >>> >>> I did a couple of experiments today. For all it's worth: the labels are a >>> means to index different document sets, since property indexes are built on >>> node label basis. I wouldn't try and introduce a label for each class in >>> yago. As mentioned before, I'd rather try and model is-a relationships with >>> nodes rather than labels. >>> >>> Is there a particular reason why you're trying your luck with neo4j instead >>> of virtuoso or jena? >>> >>> Von meinem iPad gesendet >>> >>>> Am 15.10.2015 um 23:12 schrieb Qi Song <[email protected]>: >>>> >>>> Hi Michael, >>>> Thanks for your reply :) I noticed that the code is old and use some old >>>> APIs. However, the label is a bottleneck for loading RDF files. In my >>>> work, the label is very important. I'll try to find some way to handle >>>> labels more effective. >>>> >>>> Bests~ >>>> Qi Song >>>> >>>>> On Thursday, October 15, 2015 at 2:07:08 PM UTC-7, Michael B. wrote: >>>>> Hi! >>>>> >>>>> My best guess would be that the algorithm neo4j uses is just can't cope >>>>> with the vast amount of labels this sort of use case would produce. >>>>> Anyhow, the code is very, very old... >>>>> The better approach to this would be to actually model RDF-like >>>>> relationships with nodes and introduce only a few labels for class, >>>>> individual, maybe a couple data types. >>>>> >>>>> Von meinem iPad gesendet >>>>> >>>>>> Am 15.10.2015 um 11:00 schrieb Qi Song <[email protected]>: >>>>>> >>>>>> Hello Michael, >>>>>> I try to use your Turtleloader to import >>>>>> Yago(https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/downloads/) >>>>>> into neo4j. But I met some weird problems when importing. I can import >>>>>> YagoFacts.ttl and YagoTypes.ttl well separably. But when I tried to >>>>>> import both of them I got this error. I'm not sure what's the reason. >>>>>> There is some limit for TurtleLoader or BatchImporter? >>>>>> >>>>>> Exception in thread "main" java.lang.reflect.InvocationTargetException >>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>>> at >>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >>>>>> at >>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>>>>> at java.lang.reflect.Method.invoke(Method.java:497) >>>>>> at >>>>>> org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58) >>>>>> Caused by: java.lang.RuntimeException: Panic called, so exiting >>>>>> at >>>>>> org.neo4j.unsafe.impl.batchimport.staging.AbstractStep.assertHealthy(AbstractStep.java:200) >>>>>> at >>>>>> org.neo4j.unsafe.impl.batchimport.staging.ProducerStep.process(ProducerStep.java:78) >>>>>> at >>>>>> org.neo4j.unsafe.impl.batchimport.staging.ProducerStep$1.run(ProducerStep.java:54) >>>>>> Caused by: java.lang.IllegalArgumentException >>>>>> at sun.misc.Unsafe.allocateMemory(Native Method) >>>>>> at >>>>>> org.neo4j.unsafe.impl.internal.dragons.UnsafeUtil.malloc(UnsafeUtil.java:324) >>>>>> at >>>>>> org.neo4j.unsafe.impl.batchimport.cache.OffHeapNumberArray.<init>(OffHeapNumberArray.java:41) >>>>>> at >>>>>> org.neo4j.unsafe.impl.batchimport.cache.OffHeapLongArray.<init>(OffHeapLongArray.java:34) >>>>>> at >>>>>> org.neo4j.unsafe.impl.batchimport.cache.NumberArrayFactory$2.newLongArray(NumberArrayFactory.java:122) >>>>>> at >>>>>> org.neo4j.unsafe.impl.batchimport.cache.NumberArrayFactory$Auto.newLongArray(NumberArrayFactory.java:154) >>>>>> at >>>>>> org.neo4j.unsafe.impl.batchimport.RelationshipCountsProcessor.<init>(RelationshipCountsProcessor.java:60) >>>>>> at >>>>>> org.neo4j.unsafe.impl.batchimport.ProcessRelationshipCountsDataStep.processor(ProcessRelationshipCountsDataStep.java:73) >>>>>> at >>>>>> org.neo4j.unsafe.impl.batchimport.ProcessRelationshipCountsDataStep.process(ProcessRelationshipCountsDataStep.java:60) >>>>>> at >>>>>> org.neo4j.unsafe.impl.batchimport.ProcessRelationshipCountsDataStep.process(ProcessRelationshipCountsDataStep.java:36) >>>>>> at >>>>>> org.neo4j.unsafe.impl.batchimport.staging.ProcessorStep$4.run(ProcessorStep.java:120) >>>>>> at >>>>>> org.neo4j.unsafe.impl.batchimport.staging.ProcessorStep$4.run(ProcessorStep.java:102) >>>>>> at >>>>>> org.neo4j.unsafe.impl.batchimport.executor.DynamicTaskExecutor$Processor.run(DynamicTaskExecutor.java:237) >>>>>> >>>>>> Bests~ >>>>>> Qi Song >>>>>> >>>>>>> On Friday, June 7, 2013 at 1:35:26 AM UTC-7, Michael B. wrote: >>>>>>> I checked that out in my batch importer (have a look at it on github). >>>>>>> MapDB performs pretty good, but in the end, the index look-ups aren't >>>>>>> the big bottleneck. If you need to make normal index operation at any >>>>>>> point (to make sure you're not importing duplicates) or iterate over >>>>>>> relationships of nodes to create unique relationships, everything's >>>>>>> becoming way slower. >>>>>>> >>>>>>> As far as Batch imports go, I think an in-memory MapDB ist the best >>>>>>> option. You might want to include some kind of function to create an >>>>>>> in-memory index on specific Labels/keys to allow for fast access to >>>>>>> whatever's desired for batch loads. >>>>>>> >>>>>>> Here's what I did for Batch loads: >>>>>>> https://github.com/mybyte/tools/blob/master/Turtle%20loader/src/de/miba/neo4j/loader/turtle/Neo4jMapDBBatchHandler.java >>>>>>> >>>>>>> The import went fine, pretty fast I'd say. The bigger problem is >>>>>>> overall performance on all the node operations... >>>>>>> >>>>>>> On Freitag, 7. Juni 2013 10:26:47, Michael Hunger wrote: >>>>>>> > Actually I want to update the CSV batch inserter to support index >>>>>>> > lookups and use real "csv" that means I'll put MapDB in there, we'll >>>>>>> > see how it goes. >>>>>>> > >>>>>>> > You can also see if just a standard HashMap is good enough for you or >>>>>>> > a Trove-primitive Map. Otherwise there is still that trick with the >>>>>>> > array of unique values which you can sort and then use the array >>>>>>> > index >>>>>>> > as node-id. inserter.createNode(index, props) and then the id-lookup >>>>>>> > for rels is just Arrays.binarySearch(array, value) >>>>>>> > >>>>>>> > I also have to update the batch-importer to 2.0 but that's a bigger >>>>>>> > piece of work. As lots of the internals changed in between. >>>>>>> > >>>>>>> > Michael >>>>>>> > >>>>>>> > >>>>>>> > On Fri, Jun 7, 2013 at 10:10 AM, Michael B. <[email protected] >>>>>>> > <mailto:[email protected]>> wrote: >>>>>>> > >>>>>>> > Michael Hunger has actually written a blog entry on this. Check >>>>>>> > his blog out: http://jexp.de/blog/ >>>>>>> > >>>>>>> > Standard Lucene performs poorly in many cases. The only thing >>>>>>> > it's >>>>>>> > good at is full text search with N-Gram. If you don't need that, >>>>>>> > any key-value storm performs better, e.g. MapDB or Voldemort. >>>>>>> > >>>>>>> > >>>>>>> > On Freitag, 7. Juni 2013 07:41:34, Jennifer Smith wrote: >>>>>>> > >>>>>>> > Hi Michael, >>>>>>> > >>>>>>> > Yes I was considering using MapDB. We actually do use the >>>>>>> > standard >>>>>>> > lucene indexes during our existing 1.9x batch insertion. We >>>>>>> > also do a >>>>>>> > pre-existing data check when inserting nodes and entities >>>>>>> > that >>>>>>> > uses >>>>>>> > the index. So far it's been fast enough - by that I mean >>>>>>> > taking 2/3 >>>>>>> > hours for about 50 million nodes, 90 million relationships! >>>>>>> > But when >>>>>>> > we need more performance, I am happy to explore mapdb as an >>>>>>> > option at >>>>>>> > import time. I would also probably be interested in using >>>>>>> > this >>>>>>> > as a >>>>>>> > permanent index too, rather than just at import time. >>>>>>> > >>>>>>> > Thanks >>>>>>> > >>>>>>> > Jen >>>>>>> > >>>>>>> > On Tuesday, 4 June 2013 14:31:59 UTC+1, Michael B. wrote: >>>>>>> > >>>>>>> > Check out my blog entry on batch imports: >>>>>>> > >>>>>>> > http://michaelbloggs.blogspot.__com/2013/05/importing-ttl-__turtle-ontologies-in-neo4j.__html >>>>>>> > >>>>>>> > >>>>>>> > <http://michaelbloggs.blogspot.com/2013/05/importing-ttl-turtle-ontologies-in-neo4j.html> >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > <http://michaelbloggs.__blogspot.com/2013/05/__importing-ttl-turtle-__ontologies-in-neo4j.html >>>>>>> > >>>>>>> > >>>>>>> > <http://michaelbloggs.blogspot.com/2013/05/importing-ttl-turtle-ontologies-in-neo4j.html>> >>>>>>> > >>>>>>> > >>>>>>> > Labels are a bit complicated. You shouldn't /commit /to >>>>>>> > indices >>>>>>> > >>>>>>> > during batch imports (but you can add stuff to them) - >>>>>>> > they'll >>>>>>> > make everything incredibly slow. Michael Hunger suggested >>>>>>> > to use >>>>>>> > MapDB as a temporary index. That's what I'd do in your >>>>>>> > place. >>>>>>> > Either do it like I did (for small data sets a HashMap is >>>>>>> > more >>>>>>> > than enough) and use a java.util.Map implementation + >>>>>>> > index as >>>>>>> > fallback for the nodes that are in the DB, but haven't >>>>>>> > been >>>>>>> > imported by your application or use a MapDB instead. >>>>>>> > >>>>>>> > Regards, >>>>>>> > Michael >>>>>>> > >>>>>>> > On Tuesday, 4 June 2013 11:47:25 UTC+2, Jennifer Smith >>>>>>> > wrote: >>>>>>> > >>>>>>> > Hi there, >>>>>>> > >>>>>>> > I have been looking at the docs for 2.0 particularly >>>>>>> > around >>>>>>> > support for labels during batch import. >>>>>>> > >>>>>>> > I see there is support for adding labels to nodes >>>>>>> > during batch >>>>>>> > import, directly querying labels for nodes and so on. >>>>>>> > However, >>>>>>> > unless I am missing something I don't see that there >>>>>>> > is >>>>>>> > support for locating a node by label and ID. I have >>>>>>> > found I >>>>>>> > have needed to do this when I import a large dataset >>>>>>> > where the >>>>>>> > relationships come separately from the nodes (say a >>>>>>> > dump from >>>>>>> > a relational database) and I need to use an external >>>>>>> > ID to >>>>>>> > find the nodes for the relationship. >>>>>>> > >>>>>>> > I wondered what the intended approach for looking up >>>>>>> > a node >>>>>>> > by label and ID is during batch import. I can see the >>>>>>> > following choices: >>>>>>> > >>>>>>> > - Use the standard EmbeddedGraphDatabase (making sure >>>>>>> > to have >>>>>>> > shut down the batch inserter of course) to look up >>>>>>> > the >>>>>>> > nodes >>>>>>> > for a bunch of relationship inserts before going into >>>>>>> > insert mode. >>>>>>> > - Use the BatchInserterIndexProvider to somehow hack >>>>>>> > into the >>>>>>> > underlying index that I believe is created for labels >>>>>>> > - Be patient and wait for support to appear in the >>>>>>> > batch API >>>>>>> > for querying nodes by label and ID :) >>>>>>> > >>>>>>> > Thanks >>>>>>> > >>>>>>> > Jen >>>>>>> > >>>>>>> > -- >>>>>>> > You received this message because you are subscribed to a >>>>>>> > topic in the >>>>>>> > Google Groups "Neo4j" group. >>>>>>> > To unsubscribe from this topic, visit >>>>>>> > >>>>>>> > https://groups.google.com/d/__topic/neo4j/eq_2fD2BlQU/__unsubscribe?hl=en >>>>>>> > >>>>>>> > >>>>>>> > <https://groups.google.com/d/topic/neo4j/eq_2fD2BlQU/unsubscribe?hl=en>. >>>>>>> > >>>>>>> > To unsubscribe from this group and all its topics, send an >>>>>>> > email to >>>>>>> > >>>>>>> > neo4j+unsubscribe@__googlegroups.com >>>>>>> > <mailto:neo4j%[email protected]>. >>>>>>> > For more options, visit >>>>>>> > https://groups.google.com/__groups/opt_out >>>>>>> > <https://groups.google.com/groups/opt_out>. >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > -- >>>>>>> > You received this message because you are subscribed to the >>>>>>> > Google >>>>>>> > Groups "Neo4j" group. >>>>>>> > To unsubscribe from this group and stop receiving emails from it, >>>>>>> > send an email to neo4j+unsubscribe@__googlegroups.com >>>>>>> > <mailto:neo4j%[email protected]>. >>>>>>> > For more options, visit >>>>>>> > https://groups.google.com/__groups/opt_out >>>>>>> > <https://groups.google.com/groups/opt_out>. >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > -- >>>>>>> > You received this message because you are subscribed to a topic in >>>>>>> > the >>>>>>> > Google Groups "Neo4j" group. >>>>>>> > To unsubscribe from this topic, visit >>>>>>> > https://groups.google.com/d/topic/neo4j/eq_2fD2BlQU/unsubscribe?hl=en. >>>>>>> > >>>>>>> > To unsubscribe from this group and all its topics, send an email to >>>>>>> > [email protected]. >>>>>>> > For more options, visit https://groups.google.com/groups/opt_out. >>>>>>> > >>>>>>> > >>>>>> >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to a topic in the >>>>>> Google Groups "Neo4j" group. >>>>>> To unsubscribe from this topic, visit >>>>>> https://groups.google.com/d/topic/neo4j/eq_2fD2BlQU/unsubscribe. >>>>>> To unsubscribe from this group and all its topics, send an email to >>>>>> [email protected]. >>>>>> For more options, visit https://groups.google.com/d/optout. >>>> >>>> >>>> -- >>>> You received this message because you are subscribed to a topic in the >>>> Google Groups "Neo4j" group. >>>> To unsubscribe from this topic, visit >>>> https://groups.google.com/d/topic/neo4j/eq_2fD2BlQU/unsubscribe. >>>> To unsubscribe from this group and all its topics, send an email to >>>> [email protected]. >>>> For more options, visit https://groups.google.com/d/optout. >>> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "Neo4j" group. >>> To unsubscribe from this group and stop receiving emails from it, send an >>> email to [email protected]. >>> For more options, visit https://groups.google.com/d/optout. >> >> -- >> You received this message because you are subscribed to a topic in the >> Google Groups "Neo4j" group. >> To unsubscribe from this topic, visit >> https://groups.google.com/d/topic/neo4j/eq_2fD2BlQU/unsubscribe. >> To unsubscribe from this group and all its topics, send an email to >> [email protected]. >> For more options, visit https://groups.google.com/d/optout. > > > > -- > Qi Song > Machine learning and Knowledge Discovery Group > EECS Washington State University > -- > You received this message because you are subscribed to the Google Groups > "Neo4j" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
