We are using Neo4j as a database and build some graph mining algorithms based on it. I think I can try that using is-a or has-type relationship to represent a label rather labels inside a node.
On Fri, Oct 16, 2015 at 2:40 PM, Michael Bach <[email protected]> wrote: > I did a couple of experiments today. For all it's worth: the labels are a > means to index different document sets, since property indexes are built on > node label basis. I wouldn't try and introduce a label for each class in > yago. As mentioned before, I'd rather try and model is-a relationships with > nodes rather than labels. > > Is there a particular reason why you're trying your luck with neo4j > instead of virtuoso or jena? > > Von meinem iPad gesendet > > Am 15.10.2015 um 23:12 schrieb Qi Song <[email protected]>: > > Hi Michael, > Thanks for your reply :) I noticed that the code is old and use some old > APIs. However, the label is a bottleneck for loading RDF files. In my work, > the label is very important. I'll try to find some way to handle labels > more effective. > > Bests~ > Qi Song > > On Thursday, October 15, 2015 at 2:07:08 PM UTC-7, Michael B. wrote: >> >> Hi! >> >> My best guess would be that the algorithm neo4j uses is just can't cope >> with the vast amount of labels this sort of use case would produce. Anyhow, >> the code is very, very old... >> The better approach to this would be to actually model RDF-like >> relationships with nodes and introduce only a few labels for class, >> individual, maybe a couple data types. >> >> Von meinem iPad gesendet >> >> Am 15.10.2015 um 11:00 schrieb Qi Song <[email protected]>: >> >> Hello Michael, >> I try to use your Turtleloader to import Yago( >> https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/downloads/) >> into neo4j. But I met some weird problems when importing. I can import >> YagoFacts.ttl and YagoTypes.ttl well separably. But when I tried to import >> both of them I got this error. I'm not sure what's the reason. There is >> some limit for TurtleLoader or BatchImporter? >> >> Exception in thread "main" java.lang.reflect.InvocationTargetException >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:497) >> at >> org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58) >> Caused by: java.lang.RuntimeException: Panic called, so exiting >> at >> org.neo4j.unsafe.impl.batchimport.staging.AbstractStep.assertHealthy(AbstractStep.java:200) >> at >> org.neo4j.unsafe.impl.batchimport.staging.ProducerStep.process(ProducerStep.java:78) >> at >> org.neo4j.unsafe.impl.batchimport.staging.ProducerStep$1.run(ProducerStep.java:54) >> Caused by: java.lang.IllegalArgumentException >> at sun.misc.Unsafe.allocateMemory(Native Method) >> at >> org.neo4j.unsafe.impl.internal.dragons.UnsafeUtil.malloc(UnsafeUtil.java:324) >> at >> org.neo4j.unsafe.impl.batchimport.cache.OffHeapNumberArray.<init>(OffHeapNumberArray.java:41) >> at >> org.neo4j.unsafe.impl.batchimport.cache.OffHeapLongArray.<init>(OffHeapLongArray.java:34) >> at >> org.neo4j.unsafe.impl.batchimport.cache.NumberArrayFactory$2.newLongArray(NumberArrayFactory.java:122) >> at >> org.neo4j.unsafe.impl.batchimport.cache.NumberArrayFactory$Auto.newLongArray(NumberArrayFactory.java:154) >> at >> org.neo4j.unsafe.impl.batchimport.RelationshipCountsProcessor.<init>(RelationshipCountsProcessor.java:60) >> at >> org.neo4j.unsafe.impl.batchimport.ProcessRelationshipCountsDataStep.processor(ProcessRelationshipCountsDataStep.java:73) >> at >> org.neo4j.unsafe.impl.batchimport.ProcessRelationshipCountsDataStep.process(ProcessRelationshipCountsDataStep.java:60) >> at >> org.neo4j.unsafe.impl.batchimport.ProcessRelationshipCountsDataStep.process(ProcessRelationshipCountsDataStep.java:36) >> at >> org.neo4j.unsafe.impl.batchimport.staging.ProcessorStep$4.run(ProcessorStep.java:120) >> at >> org.neo4j.unsafe.impl.batchimport.staging.ProcessorStep$4.run(ProcessorStep.java:102) >> at >> org.neo4j.unsafe.impl.batchimport.executor.DynamicTaskExecutor$Processor.run(DynamicTaskExecutor.java:237) >> >> Bests~ >> Qi Song >> >> On Friday, June 7, 2013 at 1:35:26 AM UTC-7, Michael B. wrote: >>> >>> I checked that out in my batch importer (have a look at it on github). >>> MapDB performs pretty good, but in the end, the index look-ups aren't >>> the big bottleneck. If you need to make normal index operation at any >>> point (to make sure you're not importing duplicates) or iterate over >>> relationships of nodes to create unique relationships, everything's >>> becoming way slower. >>> >>> As far as Batch imports go, I think an in-memory MapDB ist the best >>> option. You might want to include some kind of function to create an >>> in-memory index on specific Labels/keys to allow for fast access to >>> whatever's desired for batch loads. >>> >>> Here's what I did for Batch loads: >>> >>> https://github.com/mybyte/tools/blob/master/Turtle%20loader/src/de/miba/neo4j/loader/turtle/Neo4jMapDBBatchHandler.java >>> The import went fine, pretty fast I'd say. The bigger problem is >>> overall performance on all the node operations... >>> >>> On Freitag, 7. Juni 2013 10:26:47, Michael Hunger wrote: >>> > Actually I want to update the CSV batch inserter to support index >>> > lookups and use real "csv" that means I'll put MapDB in there, we'll >>> > see how it goes. >>> > >>> > You can also see if just a standard HashMap is good enough for you or >>> > a Trove-primitive Map. Otherwise there is still that trick with the >>> > array of unique values which you can sort and then use the array index >>> > as node-id. inserter.createNode(index, props) and then the id-lookup >>> > for rels is just Arrays.binarySearch(array, value) >>> > >>> > I also have to update the batch-importer to 2.0 but that's a bigger >>> > piece of work. As lots of the internals changed in between. >>> > >>> > Michael >>> > >>> > >>> > On Fri, Jun 7, 2013 at 10:10 AM, Michael B. <[email protected] >>> > <mailto:[email protected]>> wrote: >>> > >>> > Michael Hunger has actually written a blog entry on this. Check >>> > his blog out: http://jexp.de/blog/ >>> > >>> > Standard Lucene performs poorly in many cases. The only thing it's >>> > good at is full text search with N-Gram. If you don't need that, >>> > any key-value storm performs better, e.g. MapDB or Voldemort. >>> > >>> > >>> > On Freitag, 7. Juni 2013 07:41:34, Jennifer Smith wrote: >>> > >>> > Hi Michael, >>> > >>> > Yes I was considering using MapDB. We actually do use the >>> standard >>> > lucene indexes during our existing 1.9x batch insertion. We >>> > also do a >>> > pre-existing data check when inserting nodes and entities that >>> > uses >>> > the index. So far it's been fast enough - by that I mean >>> > taking 2/3 >>> > hours for about 50 million nodes, 90 million relationships! >>> > But when >>> > we need more performance, I am happy to explore mapdb as an >>> > option at >>> > import time. I would also probably be interested in using this >>> > as a >>> > permanent index too, rather than just at import time. >>> > >>> > Thanks >>> > >>> > Jen >>> > >>> > On Tuesday, 4 June 2013 14:31:59 UTC+1, Michael B. wrote: >>> > >>> > Check out my blog entry on batch imports: >>> > >>> > http://michaelbloggs.blogspot.__com/2013/05/importing-ttl-__turtle-ontologies-in-neo4j.__html >>> >>> > < >>> http://michaelbloggs.blogspot.com/2013/05/importing-ttl-turtle-ontologies-in-neo4j.html> >>> >>> > >>> > <http://michaelbloggs.__ >>> blogspot.com/2013/05/__importing-ttl-turtle-__ontologies-in-neo4j.html >>> > < >>> http://michaelbloggs.blogspot.com/2013/05/importing-ttl-turtle-ontologies-in-neo4j.html>> >>> >>> > >>> > Labels are a bit complicated. You shouldn't /commit /to >>> > indices >>> > >>> > during batch imports (but you can add stuff to them) - >>> they'll >>> > make everything incredibly slow. Michael Hunger suggested >>> > to use >>> > MapDB as a temporary index. That's what I'd do in your >>> place. >>> > Either do it like I did (for small data sets a HashMap is >>> more >>> > than enough) and use a java.util.Map implementation + >>> index as >>> > fallback for the nodes that are in the DB, but haven't >>> been >>> > imported by your application or use a MapDB instead. >>> > >>> > Regards, >>> > Michael >>> > >>> > On Tuesday, 4 June 2013 11:47:25 UTC+2, Jennifer Smith >>> wrote: >>> > >>> > Hi there, >>> > >>> > I have been looking at the docs for 2.0 particularly >>> > around >>> > support for labels during batch import. >>> > >>> > I see there is support for adding labels to nodes >>> > during batch >>> > import, directly querying labels for nodes and so on. >>> > However, >>> > unless I am missing something I don't see that there >>> is >>> > support for locating a node by label and ID. I have >>> > found I >>> > have needed to do this when I import a large dataset >>> > where the >>> > relationships come separately from the nodes (say a >>> > dump from >>> > a relational database) and I need to use an external >>> ID to >>> > find the nodes for the relationship. >>> > >>> > I wondered what the intended approach for looking up >>> > a node >>> > by label and ID is during batch import. I can see the >>> > following choices: >>> > >>> > - Use the standard EmbeddedGraphDatabase (making sure >>> > to have >>> > shut down the batch inserter of course) to look up the >>> > nodes >>> > for a bunch of relationship inserts before going into >>> > insert mode. >>> > - Use the BatchInserterIndexProvider to somehow hack >>> > into the >>> > underlying index that I believe is created for labels >>> > - Be patient and wait for support to appear in the >>> > batch API >>> > for querying nodes by label and ID :) >>> > >>> > Thanks >>> > >>> > Jen >>> > >>> > -- >>> > You received this message because you are subscribed to a >>> > topic in the >>> > Google Groups "Neo4j" group. >>> > To unsubscribe from this topic, visit >>> > >>> https://groups.google.com/d/__topic/neo4j/eq_2fD2BlQU/__unsubscribe?hl=en >>> > < >>> https://groups.google.com/d/topic/neo4j/eq_2fD2BlQU/unsubscribe?hl=en>. >>> > To unsubscribe from this group and all its topics, send an >>> > email to >>> > >>> > neo4j+unsubscribe@__googlegroups.com >>> > <mailto:neo4j%[email protected]>. >>> > For more options, visit >>> > https://groups.google.com/__groups/opt_out >>> > <https://groups.google.com/groups/opt_out>. >>> > >>> > >>> > >>> > >>> > -- >>> > You received this message because you are subscribed to the Google >>> > Groups "Neo4j" group. >>> > To unsubscribe from this group and stop receiving emails from it, >>> > send an email to neo4j+unsubscribe@__googlegroups.com >>> > <mailto:neo4j%[email protected]>. >>> > For more options, visit https://groups.google.com/__groups/opt_out >>> > <https://groups.google.com/groups/opt_out>. >>> > >>> > >>> > >>> > -- >>> > You received this message because you are subscribed to a topic in the >>> > Google Groups "Neo4j" group. >>> > To unsubscribe from this topic, visit >>> > https://groups.google.com/d/topic/neo4j/eq_2fD2BlQU/unsubscribe?hl=en. >>> >>> > To unsubscribe from this group and all its topics, send an email to >>> > [email protected]. >>> > For more options, visit https://groups.google.com/groups/opt_out. >>> > >>> > >>> >>> >>> -- >> You received this message because you are subscribed to a topic in the >> Google Groups "Neo4j" group. >> To unsubscribe from this topic, visit >> https://groups.google.com/d/topic/neo4j/eq_2fD2BlQU/unsubscribe. >> To unsubscribe from this group and all its topics, send an email to >> [email protected]. >> For more options, visit https://groups.google.com/d/optout. >> >> -- > You received this message because you are subscribed to a topic in the > Google Groups "Neo4j" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/neo4j/eq_2fD2BlQU/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > [email protected]. > For more options, visit https://groups.google.com/d/optout. > > -- > You received this message because you are subscribed to a topic in the > Google Groups "Neo4j" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/neo4j/eq_2fD2BlQU/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- Qi Song Machine learning and Knowledge Discovery Group EECS Washington State University -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
