This looks scary like denomalization

wikicat_Songwriters_from_Louisiana

Shouldn't that be 3 nodes linked to it rather than a type node

Von meinem iPhone gesendet

> Am 17.10.2015 um 11:04 schrieb Michael B. <[email protected]>:
> 
> Yago has roughly 350,000 different classes, 10 million entities and 120 
> million facts (which would be either relationships or properties).
> 
> As mentioned previously, I'd rather go with few labels are model entity types 
> as their own nodes (which is the case in RDF). You could query for it with 
> something like this:
> match 
> (x:Individual)-[t:is_a]->(c:Class{type:wikicat_Songwriters_from_Louisiana}) 
> return x
> 
>> On 17 October 2015 at 10:13, Michael Hunger 
>> <[email protected]> wrote:
>> How many different types?
>> 
>> Von meinem iPhone gesendet
>> 
>>> Am 17.10.2015 um 06:38 schrieb Qi Song <[email protected]>:
>>> 
>>> Each instance in Yago have a type, and there are millions instances.
>>> 
>>>> On Fri, Oct 16, 2015 at 3:26 PM, Michael Hunger 
>>>> <[email protected]> wrote:
>>>> Labels are roles or tags on nodes.
>>>> 
>>>> Which can be used to represent types as well.
>>>> 
>>>> That you can attach metadata like indexes is just a benefit.
>>>> 
>>>> The is-a relationships might be fine on a theoretical model, but will not 
>>>> perform that well if you have many millions or billions of them and query 
>>>> across them.
>>>> 
>>>> How many types are there in yago?
>>>> 
>>>> Michael
>>>> 
>>>>> Am 16.10.2015 um 23:40 schrieb Michael Bach <[email protected]>:
>>>>> 
>>>>> I did a couple of experiments today. For all it's worth: the labels are a 
>>>>> means to index different document sets, since property indexes are built 
>>>>> on node label basis. I wouldn't try and introduce a label for each class 
>>>>> in yago. As mentioned before, I'd rather try and model is-a relationships 
>>>>> with nodes rather than labels.
>>>>> 
>>>>> Is there a particular reason why you're trying your luck with neo4j 
>>>>> instead of virtuoso or jena?
>>>>> 
>>>>> Von meinem iPad gesendet
>>>>> 
>>>>>> Am 15.10.2015 um 23:12 schrieb Qi Song <[email protected]>:
>>>>>> 
>>>>>> Hi Michael,
>>>>>> Thanks for your reply :) I noticed that the code is old and use some old 
>>>>>> APIs. However, the label is a bottleneck for loading RDF files. In my 
>>>>>> work, the label is very important. I'll try to find some way to handle 
>>>>>> labels more effective. 
>>>>>> 
>>>>>> Bests~
>>>>>> Qi Song
>>>>>> 
>>>>>>> On Thursday, October 15, 2015 at 2:07:08 PM UTC-7, Michael B. wrote:
>>>>>>> Hi!
>>>>>>> 
>>>>>>> My best guess would be that the algorithm neo4j uses is just can't cope 
>>>>>>> with the vast amount of labels this sort of use case would produce. 
>>>>>>> Anyhow, the code is very, very old...
>>>>>>> The better approach to this would be to actually model RDF-like 
>>>>>>> relationships with nodes and introduce only a few labels for class, 
>>>>>>> individual, maybe a couple data types.
>>>>>>> 
>>>>>>> Von meinem iPad gesendet
>>>>>>> 
>>>>>>>> Am 15.10.2015 um 11:00 schrieb Qi Song <[email protected]>:
>>>>>>>> 
>>>>>>>> Hello Michael,
>>>>>>>> I try to use your Turtleloader to import 
>>>>>>>> Yago(https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/downloads/)
>>>>>>>>  into neo4j. But I met some weird problems when importing. I can 
>>>>>>>> import YagoFacts.ttl and YagoTypes.ttl well separably. But when I 
>>>>>>>> tried to import both of them I got this error. I'm not sure what's the 
>>>>>>>> reason. There is some limit for TurtleLoader or BatchImporter?
>>>>>>>> 
>>>>>>>> Exception in thread "main" java.lang.reflect.InvocationTargetException
>>>>>>>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>>>        at 
>>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>>>>>>        at 
>>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>>>>        at java.lang.reflect.Method.invoke(Method.java:497)
>>>>>>>>        at 
>>>>>>>> org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
>>>>>>>> Caused by: java.lang.RuntimeException: Panic called, so exiting
>>>>>>>>        at 
>>>>>>>> org.neo4j.unsafe.impl.batchimport.staging.AbstractStep.assertHealthy(AbstractStep.java:200)
>>>>>>>>        at 
>>>>>>>> org.neo4j.unsafe.impl.batchimport.staging.ProducerStep.process(ProducerStep.java:78)
>>>>>>>>        at 
>>>>>>>> org.neo4j.unsafe.impl.batchimport.staging.ProducerStep$1.run(ProducerStep.java:54)
>>>>>>>> Caused by: java.lang.IllegalArgumentException
>>>>>>>>        at sun.misc.Unsafe.allocateMemory(Native Method)
>>>>>>>>        at 
>>>>>>>> org.neo4j.unsafe.impl.internal.dragons.UnsafeUtil.malloc(UnsafeUtil.java:324)
>>>>>>>>        at 
>>>>>>>> org.neo4j.unsafe.impl.batchimport.cache.OffHeapNumberArray.<init>(OffHeapNumberArray.java:41)
>>>>>>>>        at 
>>>>>>>> org.neo4j.unsafe.impl.batchimport.cache.OffHeapLongArray.<init>(OffHeapLongArray.java:34)
>>>>>>>>        at 
>>>>>>>> org.neo4j.unsafe.impl.batchimport.cache.NumberArrayFactory$2.newLongArray(NumberArrayFactory.java:122)
>>>>>>>>        at 
>>>>>>>> org.neo4j.unsafe.impl.batchimport.cache.NumberArrayFactory$Auto.newLongArray(NumberArrayFactory.java:154)
>>>>>>>>        at 
>>>>>>>> org.neo4j.unsafe.impl.batchimport.RelationshipCountsProcessor.<init>(RelationshipCountsProcessor.java:60)
>>>>>>>>        at 
>>>>>>>> org.neo4j.unsafe.impl.batchimport.ProcessRelationshipCountsDataStep.processor(ProcessRelationshipCountsDataStep.java:73)
>>>>>>>>        at 
>>>>>>>> org.neo4j.unsafe.impl.batchimport.ProcessRelationshipCountsDataStep.process(ProcessRelationshipCountsDataStep.java:60)
>>>>>>>>        at 
>>>>>>>> org.neo4j.unsafe.impl.batchimport.ProcessRelationshipCountsDataStep.process(ProcessRelationshipCountsDataStep.java:36)
>>>>>>>>        at 
>>>>>>>> org.neo4j.unsafe.impl.batchimport.staging.ProcessorStep$4.run(ProcessorStep.java:120)
>>>>>>>>        at 
>>>>>>>> org.neo4j.unsafe.impl.batchimport.staging.ProcessorStep$4.run(ProcessorStep.java:102)
>>>>>>>>        at 
>>>>>>>> org.neo4j.unsafe.impl.batchimport.executor.DynamicTaskExecutor$Processor.run(DynamicTaskExecutor.java:237)
>>>>>>>> 
>>>>>>>> Bests~
>>>>>>>> Qi Song
>>>>>>>> 
>>>>>>>>> On Friday, June 7, 2013 at 1:35:26 AM UTC-7, Michael B. wrote:
>>>>>>>>> I checked that out in my batch importer (have a look at it on 
>>>>>>>>> github). 
>>>>>>>>> MapDB performs pretty good, but in the end, the index look-ups aren't 
>>>>>>>>> the big bottleneck. If you need to make normal index operation at any 
>>>>>>>>> point (to make sure you're not importing duplicates) or iterate over 
>>>>>>>>> relationships of nodes to create unique relationships, everything's 
>>>>>>>>> becoming way slower. 
>>>>>>>>> 
>>>>>>>>> As far as Batch imports go, I think an in-memory MapDB ist the best 
>>>>>>>>> option. You might want to include some kind of function to create an 
>>>>>>>>> in-memory index on specific Labels/keys to allow for fast access to 
>>>>>>>>> whatever's desired for batch loads. 
>>>>>>>>> 
>>>>>>>>> Here's what I did for Batch loads: 
>>>>>>>>> https://github.com/mybyte/tools/blob/master/Turtle%20loader/src/de/miba/neo4j/loader/turtle/Neo4jMapDBBatchHandler.java
>>>>>>>>>  
>>>>>>>>> The import went fine, pretty fast I'd say. The bigger problem is 
>>>>>>>>> overall performance on all the node operations... 
>>>>>>>>> 
>>>>>>>>> On Freitag, 7. Juni 2013 10:26:47, Michael Hunger wrote: 
>>>>>>>>> > Actually I want to update the CSV batch inserter to support index 
>>>>>>>>> > lookups and use real "csv" that means I'll put MapDB in there, 
>>>>>>>>> > we'll 
>>>>>>>>> > see how it goes. 
>>>>>>>>> > 
>>>>>>>>> > You can also see if just a standard HashMap is good enough for you 
>>>>>>>>> > or 
>>>>>>>>> > a Trove-primitive Map. Otherwise there is still that trick with the 
>>>>>>>>> > array of unique values which you can sort and then use the array 
>>>>>>>>> > index 
>>>>>>>>> > as node-id. inserter.createNode(index, props) and then the 
>>>>>>>>> > id-lookup 
>>>>>>>>> > for rels is just Arrays.binarySearch(array, value) 
>>>>>>>>> > 
>>>>>>>>> > I also have to update the batch-importer to 2.0 but that's a bigger 
>>>>>>>>> > piece of work. As lots of the internals changed in between. 
>>>>>>>>> > 
>>>>>>>>> > Michael 
>>>>>>>>> > 
>>>>>>>>> > 
>>>>>>>>> > On Fri, Jun 7, 2013 at 10:10 AM, Michael B. <[email protected] 
>>>>>>>>> > <mailto:[email protected]>> wrote: 
>>>>>>>>> > 
>>>>>>>>> >     Michael Hunger has actually written a blog entry on this. Check 
>>>>>>>>> >     his blog out: http://jexp.de/blog/ 
>>>>>>>>> > 
>>>>>>>>> >     Standard Lucene performs poorly in many cases. The only thing 
>>>>>>>>> > it's 
>>>>>>>>> >     good at is full text search with N-Gram. If you don't need 
>>>>>>>>> > that, 
>>>>>>>>> >     any key-value storm performs better, e.g. MapDB or Voldemort. 
>>>>>>>>> > 
>>>>>>>>> > 
>>>>>>>>> >     On Freitag, 7. Juni 2013 07:41:34, Jennifer Smith wrote: 
>>>>>>>>> > 
>>>>>>>>> >         Hi Michael, 
>>>>>>>>> > 
>>>>>>>>> >         Yes I was considering using MapDB. We actually do use the 
>>>>>>>>> > standard 
>>>>>>>>> >         lucene indexes during our existing 1.9x batch insertion. We 
>>>>>>>>> >         also do a 
>>>>>>>>> >         pre-existing data check when inserting nodes and entities 
>>>>>>>>> > that 
>>>>>>>>> >         uses 
>>>>>>>>> >         the index. So far it's been fast enough - by that I mean 
>>>>>>>>> >         taking 2/3 
>>>>>>>>> >         hours for about 50 million nodes, 90 million relationships! 
>>>>>>>>> >         But when 
>>>>>>>>> >         we need more performance, I am happy to explore mapdb as an 
>>>>>>>>> >         option at 
>>>>>>>>> >         import time. I would also probably be interested in using 
>>>>>>>>> > this 
>>>>>>>>> >         as a 
>>>>>>>>> >         permanent index too, rather than just at import time. 
>>>>>>>>> > 
>>>>>>>>> >         Thanks 
>>>>>>>>> > 
>>>>>>>>> >         Jen 
>>>>>>>>> > 
>>>>>>>>> >         On Tuesday, 4 June 2013 14:31:59 UTC+1, Michael B. wrote: 
>>>>>>>>> > 
>>>>>>>>> >             Check out my blog entry on batch imports: 
>>>>>>>>> >         
>>>>>>>>> > http://michaelbloggs.blogspot.__com/2013/05/importing-ttl-__turtle-ontologies-in-neo4j.__html
>>>>>>>>> >  
>>>>>>>>> >         
>>>>>>>>> > <http://michaelbloggs.blogspot.com/2013/05/importing-ttl-turtle-ontologies-in-neo4j.html>
>>>>>>>>> >  
>>>>>>>>> > 
>>>>>>>>> >         
>>>>>>>>> > <http://michaelbloggs.__blogspot.com/2013/05/__importing-ttl-turtle-__ontologies-in-neo4j.html
>>>>>>>>> >  
>>>>>>>>> >         
>>>>>>>>> > <http://michaelbloggs.blogspot.com/2013/05/importing-ttl-turtle-ontologies-in-neo4j.html>>
>>>>>>>>> >  
>>>>>>>>> > 
>>>>>>>>> >             Labels are a bit complicated. You shouldn't /commit /to 
>>>>>>>>> >         indices 
>>>>>>>>> > 
>>>>>>>>> >             during batch imports (but you can add stuff to them) - 
>>>>>>>>> > they'll 
>>>>>>>>> >             make everything incredibly slow. Michael Hunger 
>>>>>>>>> > suggested 
>>>>>>>>> >         to use 
>>>>>>>>> >             MapDB as a temporary index. That's what I'd do in your 
>>>>>>>>> > place. 
>>>>>>>>> >             Either do it like I did (for small data sets a HashMap 
>>>>>>>>> > is more 
>>>>>>>>> >             than enough) and use a java.util.Map implementation + 
>>>>>>>>> > index as 
>>>>>>>>> >             fallback for the nodes that are in the DB, but haven't 
>>>>>>>>> > been 
>>>>>>>>> >             imported by your application or use a MapDB instead. 
>>>>>>>>> > 
>>>>>>>>> >             Regards, 
>>>>>>>>> >             Michael 
>>>>>>>>> > 
>>>>>>>>> >             On Tuesday, 4 June 2013 11:47:25 UTC+2, Jennifer Smith 
>>>>>>>>> > wrote: 
>>>>>>>>> > 
>>>>>>>>> >                 Hi there, 
>>>>>>>>> > 
>>>>>>>>> >                 I have been looking at the docs for 2.0 
>>>>>>>>> > particularly 
>>>>>>>>> >         around 
>>>>>>>>> >                 support for labels during batch import. 
>>>>>>>>> > 
>>>>>>>>> >                 I see there is support for adding labels to nodes 
>>>>>>>>> >         during batch 
>>>>>>>>> >                 import, directly querying labels for nodes and so 
>>>>>>>>> > on. 
>>>>>>>>> >         However, 
>>>>>>>>> >                 unless I am missing something I don't see that 
>>>>>>>>> > there is 
>>>>>>>>> >                 support for locating a node by label and ID. I have 
>>>>>>>>> >         found I 
>>>>>>>>> >                 have needed to do this when I import a large 
>>>>>>>>> > dataset 
>>>>>>>>> >         where the 
>>>>>>>>> >                 relationships come separately from the nodes (say a 
>>>>>>>>> >         dump from 
>>>>>>>>> >                 a relational database) and I need to use an 
>>>>>>>>> > external ID to 
>>>>>>>>> >                 find the nodes for the relationship. 
>>>>>>>>> > 
>>>>>>>>> >                  I wondered what the intended approach for looking 
>>>>>>>>> > up 
>>>>>>>>> >         a node 
>>>>>>>>> >                 by label and ID is during batch import. I can see 
>>>>>>>>> > the 
>>>>>>>>> >                 following choices: 
>>>>>>>>> > 
>>>>>>>>> >                 - Use the standard EmbeddedGraphDatabase (making 
>>>>>>>>> > sure 
>>>>>>>>> >         to have 
>>>>>>>>> >                 shut down the batch inserter of course) to look up 
>>>>>>>>> > the 
>>>>>>>>> >         nodes 
>>>>>>>>> >                 for a bunch of relationship inserts before going 
>>>>>>>>> > into 
>>>>>>>>> >         insert mode. 
>>>>>>>>> >                 - Use the BatchInserterIndexProvider to somehow 
>>>>>>>>> > hack 
>>>>>>>>> >         into the 
>>>>>>>>> >                 underlying index that I believe is created for 
>>>>>>>>> > labels 
>>>>>>>>> >                 - Be patient and wait for support to appear in the 
>>>>>>>>> >         batch API 
>>>>>>>>> >                 for querying nodes by label and ID :) 
>>>>>>>>> > 
>>>>>>>>> >                 Thanks 
>>>>>>>>> > 
>>>>>>>>> >                 Jen 
>>>>>>>>> > 
>>>>>>>>> >         -- 
>>>>>>>>> >         You received this message because you are subscribed to a 
>>>>>>>>> >         topic in the 
>>>>>>>>> >         Google Groups "Neo4j" group. 
>>>>>>>>> >         To unsubscribe from this topic, visit 
>>>>>>>>> >         
>>>>>>>>> > https://groups.google.com/d/__topic/neo4j/eq_2fD2BlQU/__unsubscribe?hl=en
>>>>>>>>> >  
>>>>>>>>> >         
>>>>>>>>> > <https://groups.google.com/d/topic/neo4j/eq_2fD2BlQU/unsubscribe?hl=en>.
>>>>>>>>> >  
>>>>>>>>> >         To unsubscribe from this group and all its topics, send an 
>>>>>>>>> >         email to 
>>>>>>>>> > 
>>>>>>>>> >         neo4j+unsubscribe@__googlegroups.com 
>>>>>>>>> >         <mailto:neo4j%[email protected]>. 
>>>>>>>>> >         For more options, visit 
>>>>>>>>> >         https://groups.google.com/__groups/opt_out 
>>>>>>>>> >         <https://groups.google.com/groups/opt_out>. 
>>>>>>>>> > 
>>>>>>>>> > 
>>>>>>>>> > 
>>>>>>>>> > 
>>>>>>>>> >     -- 
>>>>>>>>> >     You received this message because you are subscribed to the 
>>>>>>>>> > Google 
>>>>>>>>> >     Groups "Neo4j" group. 
>>>>>>>>> >     To unsubscribe from this group and stop receiving emails from 
>>>>>>>>> > it, 
>>>>>>>>> >     send an email to neo4j+unsubscribe@__googlegroups.com 
>>>>>>>>> >     <mailto:neo4j%[email protected]>. 
>>>>>>>>> >     For more options, visit 
>>>>>>>>> > https://groups.google.com/__groups/opt_out 
>>>>>>>>> >     <https://groups.google.com/groups/opt_out>. 
>>>>>>>>> > 
>>>>>>>>> > 
>>>>>>>>> > 
>>>>>>>>> > -- 
>>>>>>>>> > You received this message because you are subscribed to a topic in 
>>>>>>>>> > the 
>>>>>>>>> > Google Groups "Neo4j" group. 
>>>>>>>>> > To unsubscribe from this topic, visit 
>>>>>>>>> > https://groups.google.com/d/topic/neo4j/eq_2fD2BlQU/unsubscribe?hl=en.
>>>>>>>>> >  
>>>>>>>>> > To unsubscribe from this group and all its topics, send an email to 
>>>>>>>>> > [email protected]. 
>>>>>>>>> > For more options, visit https://groups.google.com/groups/opt_out. 
>>>>>>>>> > 
>>>>>>>>> > 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> -- 
>>>>>>>> You received this message because you are subscribed to a topic in the 
>>>>>>>> Google Groups "Neo4j" group.
>>>>>>>> To unsubscribe from this topic, visit 
>>>>>>>> https://groups.google.com/d/topic/neo4j/eq_2fD2BlQU/unsubscribe.
>>>>>>>> To unsubscribe from this group and all its topics, send an email to 
>>>>>>>> [email protected].
>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>> 
>>>>>> 
>>>>>> -- 
>>>>>> You received this message because you are subscribed to a topic in the 
>>>>>> Google Groups "Neo4j" group.
>>>>>> To unsubscribe from this topic, visit 
>>>>>> https://groups.google.com/d/topic/neo4j/eq_2fD2BlQU/unsubscribe.
>>>>>> To unsubscribe from this group and all its topics, send an email to 
>>>>>> [email protected].
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>> 
>>>>> -- 
>>>>> You received this message because you are subscribed to the Google Groups 
>>>>> "Neo4j" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>>>> email to [email protected].
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>> 
>>>> -- 
>>>> You received this message because you are subscribed to a topic in the 
>>>> Google Groups "Neo4j" group.
>>>> To unsubscribe from this topic, visit 
>>>> https://groups.google.com/d/topic/neo4j/eq_2fD2BlQU/unsubscribe.
>>>> To unsubscribe from this group and all its topics, send an email to 
>>>> [email protected].
>>>> For more options, visit https://groups.google.com/d/optout.
>>> 
>>> 
>>> 
>>> -- 
>>> Qi Song
>>> Machine learning and Knowledge Discovery Group
>>> EECS Washington State University
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "Neo4j" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>> 
>> -- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "Neo4j" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/neo4j/eq_2fD2BlQU/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to 
>> [email protected].
>> For more options, visit https://groups.google.com/d/optout.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to