Each instance in Yago have a type, and there are millions instances.

On Fri, Oct 16, 2015 at 3:26 PM, Michael Hunger <
[email protected]> wrote:

> Labels are roles or tags on nodes.
>
> Which can be used to represent types as well.
>
> That you can attach metadata like indexes is just a benefit.
>
> The is-a relationships might be fine on a theoretical model, but will not
> perform that well if you have many millions or billions of them and query
> across them.
>
> How many types are there in yago?
>
> Michael
>
> Am 16.10.2015 um 23:40 schrieb Michael Bach <[email protected]>:
>
> I did a couple of experiments today. For all it's worth: the labels are a
> means to index different document sets, since property indexes are built on
> node label basis. I wouldn't try and introduce a label for each class in
> yago. As mentioned before, I'd rather try and model is-a relationships with
> nodes rather than labels.
>
> Is there a particular reason why you're trying your luck with neo4j
> instead of virtuoso or jena?
>
> Von meinem iPad gesendet
>
> Am 15.10.2015 um 23:12 schrieb Qi Song <[email protected]>:
>
> Hi Michael,
> Thanks for your reply :) I noticed that the code is old and use some old
> APIs. However, the label is a bottleneck for loading RDF files. In my work,
> the label is very important. I'll try to find some way to handle labels
> more effective.
>
> Bests~
> Qi Song
>
> On Thursday, October 15, 2015 at 2:07:08 PM UTC-7, Michael B. wrote:
>>
>> Hi!
>>
>> My best guess would be that the algorithm neo4j uses is just can't cope
>> with the vast amount of labels this sort of use case would produce. Anyhow,
>> the code is very, very old...
>> The better approach to this would be to actually model RDF-like
>> relationships with nodes and introduce only a few labels for class,
>> individual, maybe a couple data types.
>>
>> Von meinem iPad gesendet
>>
>> Am 15.10.2015 um 11:00 schrieb Qi Song <[email protected]>:
>>
>> Hello Michael,
>> I try to use your Turtleloader to import Yago(
>> https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/downloads/)
>> into neo4j. But I met some weird problems when importing. I can import
>> YagoFacts.ttl and YagoTypes.ttl well separably. But when I tried to import
>> both of them I got this error. I'm not sure what's the reason. There is
>> some limit for TurtleLoader or BatchImporter?
>>
>> Exception in thread "main" java.lang.reflect.InvocationTargetException
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:497)
>> at
>> org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
>> Caused by: java.lang.RuntimeException: Panic called, so exiting
>> at
>> org.neo4j.unsafe.impl.batchimport.staging.AbstractStep.assertHealthy(AbstractStep.java:200)
>> at
>> org.neo4j.unsafe.impl.batchimport.staging.ProducerStep.process(ProducerStep.java:78)
>> at
>> org.neo4j.unsafe.impl.batchimport.staging.ProducerStep$1.run(ProducerStep.java:54)
>> Caused by: java.lang.IllegalArgumentException
>> at sun.misc.Unsafe.allocateMemory(Native Method)
>> at
>> org.neo4j.unsafe.impl.internal.dragons.UnsafeUtil.malloc(UnsafeUtil.java:324)
>> at
>> org.neo4j.unsafe.impl.batchimport.cache.OffHeapNumberArray.<init>(OffHeapNumberArray.java:41)
>> at
>> org.neo4j.unsafe.impl.batchimport.cache.OffHeapLongArray.<init>(OffHeapLongArray.java:34)
>> at
>> org.neo4j.unsafe.impl.batchimport.cache.NumberArrayFactory$2.newLongArray(NumberArrayFactory.java:122)
>> at
>> org.neo4j.unsafe.impl.batchimport.cache.NumberArrayFactory$Auto.newLongArray(NumberArrayFactory.java:154)
>> at
>> org.neo4j.unsafe.impl.batchimport.RelationshipCountsProcessor.<init>(RelationshipCountsProcessor.java:60)
>> at
>> org.neo4j.unsafe.impl.batchimport.ProcessRelationshipCountsDataStep.processor(ProcessRelationshipCountsDataStep.java:73)
>> at
>> org.neo4j.unsafe.impl.batchimport.ProcessRelationshipCountsDataStep.process(ProcessRelationshipCountsDataStep.java:60)
>> at
>> org.neo4j.unsafe.impl.batchimport.ProcessRelationshipCountsDataStep.process(ProcessRelationshipCountsDataStep.java:36)
>> at
>> org.neo4j.unsafe.impl.batchimport.staging.ProcessorStep$4.run(ProcessorStep.java:120)
>> at
>> org.neo4j.unsafe.impl.batchimport.staging.ProcessorStep$4.run(ProcessorStep.java:102)
>> at
>> org.neo4j.unsafe.impl.batchimport.executor.DynamicTaskExecutor$Processor.run(DynamicTaskExecutor.java:237)
>>
>> Bests~
>> Qi Song
>>
>> On Friday, June 7, 2013 at 1:35:26 AM UTC-7, Michael B. wrote:
>>>
>>> I checked that out in my batch importer (have a look at it on github).
>>> MapDB performs pretty good, but in the end, the index look-ups aren't
>>> the big bottleneck. If you need to make normal index operation at any
>>> point (to make sure you're not importing duplicates) or iterate over
>>> relationships of nodes to create unique relationships, everything's
>>> becoming way slower.
>>>
>>> As far as Batch imports go, I think an in-memory MapDB ist the best
>>> option. You might want to include some kind of function to create an
>>> in-memory index on specific Labels/keys to allow for fast access to
>>> whatever's desired for batch loads.
>>>
>>> Here's what I did for Batch loads:
>>>
>>> https://github.com/mybyte/tools/blob/master/Turtle%20loader/src/de/miba/neo4j/loader/turtle/Neo4jMapDBBatchHandler.java
>>>
>>> The import went fine, pretty fast I'd say. The bigger problem is
>>> overall performance on all the node operations...
>>>
>>> On Freitag, 7. Juni 2013 10:26:47, Michael Hunger wrote:
>>> > Actually I want to update the CSV batch inserter to support index
>>> > lookups and use real "csv" that means I'll put MapDB in there, we'll
>>> > see how it goes.
>>> >
>>> > You can also see if just a standard HashMap is good enough for you or
>>> > a Trove-primitive Map. Otherwise there is still that trick with the
>>> > array of unique values which you can sort and then use the array index
>>>
>>> > as node-id. inserter.createNode(index, props) and then the id-lookup
>>> > for rels is just Arrays.binarySearch(array, value)
>>> >
>>> > I also have to update the batch-importer to 2.0 but that's a bigger
>>> > piece of work. As lots of the internals changed in between.
>>> >
>>> > Michael
>>> >
>>> >
>>> > On Fri, Jun 7, 2013 at 10:10 AM, Michael B. <[email protected]
>>> > <mailto:[email protected]>> wrote:
>>> >
>>> >     Michael Hunger has actually written a blog entry on this. Check
>>> >     his blog out: http://jexp.de/blog/
>>> >
>>> >     Standard Lucene performs poorly in many cases. The only thing it's
>>>
>>> >     good at is full text search with N-Gram. If you don't need that,
>>> >     any key-value storm performs better, e.g. MapDB or Voldemort.
>>> >
>>> >
>>> >     On Freitag, 7. Juni 2013 07:41:34, Jennifer Smith wrote:
>>> >
>>> >         Hi Michael,
>>> >
>>> >         Yes I was considering using MapDB. We actually do use the
>>> standard
>>> >         lucene indexes during our existing 1.9x batch insertion. We
>>> >         also do a
>>> >         pre-existing data check when inserting nodes and entities that
>>>
>>> >         uses
>>> >         the index. So far it's been fast enough - by that I mean
>>> >         taking 2/3
>>> >         hours for about 50 million nodes, 90 million relationships!
>>> >         But when
>>> >         we need more performance, I am happy to explore mapdb as an
>>> >         option at
>>> >         import time. I would also probably be interested in using this
>>>
>>> >         as a
>>> >         permanent index too, rather than just at import time.
>>> >
>>> >         Thanks
>>> >
>>> >         Jen
>>> >
>>> >         On Tuesday, 4 June 2013 14:31:59 UTC+1, Michael B. wrote:
>>> >
>>> >             Check out my blog entry on batch imports:
>>> >         http://michaelbloggs.blogspot.
>>> __com/2013/05/importing-ttl-__turtle-ontologies-in-neo4j.__html
>>> >         <
>>> http://michaelbloggs.blogspot.com/2013/05/importing-ttl-turtle-ontologies-in-neo4j.html
>>> >
>>> >
>>> >         <http://michaelbloggs.__
>>> blogspot.com/2013/05/__importing-ttl-turtle-__ontologies-in-neo4j.html
>>> >         <
>>> http://michaelbloggs.blogspot.com/2013/05/importing-ttl-turtle-ontologies-in-neo4j.html
>>> >>
>>> >
>>> >             Labels are a bit complicated. You shouldn't /commit /to
>>> >         indices
>>> >
>>> >             during batch imports (but you can add stuff to them) -
>>> they'll
>>> >             make everything incredibly slow. Michael Hunger suggested
>>> >         to use
>>> >             MapDB as a temporary index. That's what I'd do in your
>>> place.
>>> >             Either do it like I did (for small data sets a HashMap is
>>> more
>>> >             than enough) and use a java.util.Map implementation +
>>> index as
>>> >             fallback for the nodes that are in the DB, but haven't been
>>>
>>> >             imported by your application or use a MapDB instead.
>>> >
>>> >             Regards,
>>> >             Michael
>>> >
>>> >             On Tuesday, 4 June 2013 11:47:25 UTC+2, Jennifer Smith
>>> wrote:
>>> >
>>> >                 Hi there,
>>> >
>>> >                 I have been looking at the docs for 2.0 particularly
>>> >         around
>>> >                 support for labels during batch import.
>>> >
>>> >                 I see there is support for adding labels to nodes
>>> >         during batch
>>> >                 import, directly querying labels for nodes and so on.
>>> >         However,
>>> >                 unless I am missing something I don't see that there is
>>>
>>> >                 support for locating a node by label and ID. I have
>>> >         found I
>>> >                 have needed to do this when I import a large dataset
>>> >         where the
>>> >                 relationships come separately from the nodes (say a
>>> >         dump from
>>> >                 a relational database) and I need to use an external
>>> ID to
>>> >                 find the nodes for the relationship.
>>> >
>>> >                  I wondered what the intended approach for looking up
>>> >         a node
>>> >                 by label and ID is during batch import. I can see the
>>> >                 following choices:
>>> >
>>> >                 - Use the standard EmbeddedGraphDatabase (making sure
>>> >         to have
>>> >                 shut down the batch inserter of course) to look up the
>>>
>>> >         nodes
>>> >                 for a bunch of relationship inserts before going into
>>> >         insert mode.
>>> >                 - Use the BatchInserterIndexProvider to somehow hack
>>> >         into the
>>> >                 underlying index that I believe is created for labels
>>> >                 - Be patient and wait for support to appear in the
>>> >         batch API
>>> >                 for querying nodes by label and ID :)
>>> >
>>> >                 Thanks
>>> >
>>> >                 Jen
>>> >
>>> >         --
>>> >         You received this message because you are subscribed to a
>>> >         topic in the
>>> >         Google Groups "Neo4j" group.
>>> >         To unsubscribe from this topic, visit
>>> >
>>> https://groups.google.com/d/__topic/neo4j/eq_2fD2BlQU/__unsubscribe?hl=en
>>>
>>> >         <
>>> https://groups.google.com/d/topic/neo4j/eq_2fD2BlQU/unsubscribe?hl=en>.
>>> >         To unsubscribe from this group and all its topics, send an
>>> >         email to
>>> >
>>> >         neo4j+unsubscribe@__googlegroups.com
>>> >         <mailto:neo4j%[email protected]>.
>>> >         For more options, visit
>>> >         https://groups.google.com/__groups/opt_out
>>> >         <https://groups.google.com/groups/opt_out>.
>>> >
>>> >
>>> >
>>> >
>>> >     --
>>> >     You received this message because you are subscribed to the Google
>>>
>>> >     Groups "Neo4j" group.
>>> >     To unsubscribe from this group and stop receiving emails from it,
>>> >     send an email to neo4j+unsubscribe@__googlegroups.com
>>> >     <mailto:neo4j%[email protected]>.
>>> >     For more options, visit https://groups.google.com/__groups/opt_out
>>>
>>> >     <https://groups.google.com/groups/opt_out>.
>>> >
>>> >
>>> >
>>> > --
>>> > You received this message because you are subscribed to a topic in the
>>>
>>> > Google Groups "Neo4j" group.
>>> > To unsubscribe from this topic, visit
>>> > https://groups.google.com/d/topic/neo4j/eq_2fD2BlQU/unsubscribe?hl=en.
>>>
>>> > To unsubscribe from this group and all its topics, send an email to
>>> > [email protected].
>>> > For more options, visit https://groups.google.com/groups/opt_out.
>>> >
>>> >
>>>
>>>
>>>
>> --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "Neo4j" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/neo4j/eq_2fD2BlQU/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> [email protected].
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "Neo4j" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/neo4j/eq_2fD2BlQU/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "Neo4j" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/neo4j/eq_2fD2BlQU/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Qi Song
Machine learning and Knowledge Discovery Group
EECS Washington State University

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to