Re: [Neo4j] Re: Approach for using labels during batch import

Qi Song Thu, 15 Oct 2015 03:09:57 -0700

Hello Michael,
I try to use your Turtleloader to import 
Yago(https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/downloads/)
 
into neo4j. But I met some weird problems when importing. I can import 
YagoFacts.ttl and YagoTypes.ttl well separably. But when I tried to import 
both of them I got this error. I'm not sure what's the reason. There is 
some limit for TurtleLoader or BatchImporter?


Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
Caused by: java.lang.RuntimeException: Panic called, so exiting
at 
org.neo4j.unsafe.impl.batchimport.staging.AbstractStep.assertHealthy(AbstractStep.java:200)
at 
org.neo4j.unsafe.impl.batchimport.staging.ProducerStep.process(ProducerStep.java:78)
at 
org.neo4j.unsafe.impl.batchimport.staging.ProducerStep$1.run(ProducerStep.java:54)
Caused by: java.lang.IllegalArgumentException
at sun.misc.Unsafe.allocateMemory(Native Method)
at 
org.neo4j.unsafe.impl.internal.dragons.UnsafeUtil.malloc(UnsafeUtil.java:324)
at 
org.neo4j.unsafe.impl.batchimport.cache.OffHeapNumberArray.<init>(OffHeapNumberArray.java:41)
at 
org.neo4j.unsafe.impl.batchimport.cache.OffHeapLongArray.<init>(OffHeapLongArray.java:34)
at 
org.neo4j.unsafe.impl.batchimport.cache.NumberArrayFactory$2.newLongArray(NumberArrayFactory.java:122)
at 
org.neo4j.unsafe.impl.batchimport.cache.NumberArrayFactory$Auto.newLongArray(NumberArrayFactory.java:154)
at 
org.neo4j.unsafe.impl.batchimport.RelationshipCountsProcessor.<init>(RelationshipCountsProcessor.java:60)
at 
org.neo4j.unsafe.impl.batchimport.ProcessRelationshipCountsDataStep.processor(ProcessRelationshipCountsDataStep.java:73)
at 
org.neo4j.unsafe.impl.batchimport.ProcessRelationshipCountsDataStep.process(ProcessRelationshipCountsDataStep.java:60)
at 
org.neo4j.unsafe.impl.batchimport.ProcessRelationshipCountsDataStep.process(ProcessRelationshipCountsDataStep.java:36)
at 
org.neo4j.unsafe.impl.batchimport.staging.ProcessorStep$4.run(ProcessorStep.java:120)
at 
org.neo4j.unsafe.impl.batchimport.staging.ProcessorStep$4.run(ProcessorStep.java:102)
at 
org.neo4j.unsafe.impl.batchimport.executor.DynamicTaskExecutor$Processor.run(DynamicTaskExecutor.java:237)

Bests~
Qi Song

On Friday, June 7, 2013 at 1:35:26 AM UTC-7, Michael B. wrote:
>
> I checked that out in my batch importer (have a look at it on github). 
> MapDB performs pretty good, but in the end, the index look-ups aren't 
> the big bottleneck. If you need to make normal index operation at any 
> point (to make sure you're not importing duplicates) or iterate over 
> relationships of nodes to create unique relationships, everything's 
> becoming way slower. 
>
> As far as Batch imports go, I think an in-memory MapDB ist the best 
> option. You might want to include some kind of function to create an 
> in-memory index on specific Labels/keys to allow for fast access to 
> whatever's desired for batch loads. 
>
> Here's what I did for Batch loads: 
>
> https://github.com/mybyte/tools/blob/master/Turtle%20loader/src/de/miba/neo4j/loader/turtle/Neo4jMapDBBatchHandler.java
>  
> The import went fine, pretty fast I'd say. The bigger problem is 
> overall performance on all the node operations... 
>
> On Freitag, 7. Juni 2013 10:26:47, Michael Hunger wrote: 
> > Actually I want to update the CSV batch inserter to support index 
> > lookups and use real "csv" that means I'll put MapDB in there, we'll 
> > see how it goes. 
> > 
> > You can also see if just a standard HashMap is good enough for you or 
> > a Trove-primitive Map. Otherwise there is still that trick with the 
> > array of unique values which you can sort and then use the array index 
> > as node-id. inserter.createNode(index, props) and then the id-lookup 
> > for rels is just Arrays.binarySearch(array, value) 
> > 
> > I also have to update the batch-importer to 2.0 but that's a bigger 
> > piece of work. As lots of the internals changed in between. 
> > 
> > Michael 
> > 
> > 
> > On Fri, Jun 7, 2013 at 10:10 AM, Michael B. <[email protected] 
> <javascript:> 
> > <mailto:[email protected] <javascript:>>> wrote: 
> > 
> >     Michael Hunger has actually written a blog entry on this. Check 
> >     his blog out: http://jexp.de/blog/ 
> > 
> >     Standard Lucene performs poorly in many cases. The only thing it's 
> >     good at is full text search with N-Gram. If you don't need that, 
> >     any key-value storm performs better, e.g. MapDB or Voldemort. 
> > 
> > 
> >     On Freitag, 7. Juni 2013 07:41:34, Jennifer Smith wrote: 
> > 
> >         Hi Michael, 
> > 
> >         Yes I was considering using MapDB. We actually do use the 
> standard 
> >         lucene indexes during our existing 1.9x batch insertion. We 
> >         also do a 
> >         pre-existing data check when inserting nodes and entities that 
> >         uses 
> >         the index. So far it's been fast enough - by that I mean 
> >         taking 2/3 
> >         hours for about 50 million nodes, 90 million relationships! 
> >         But when 
> >         we need more performance, I am happy to explore mapdb as an 
> >         option at 
> >         import time. I would also probably be interested in using this 
> >         as a 
> >         permanent index too, rather than just at import time. 
> > 
> >         Thanks 
> > 
> >         Jen 
> > 
> >         On Tuesday, 4 June 2013 14:31:59 UTC+1, Michael B. wrote: 
> > 
> >             Check out my blog entry on batch imports: 
> >         
> > http://michaelbloggs.blogspot.__com/2013/05/importing-ttl-__turtle-ontologies-in-neo4j.__html
> >  
>
> >         <
> http://michaelbloggs.blogspot.com/2013/05/importing-ttl-turtle-ontologies-in-neo4j.html>
>  
>
> > 
> >         <http://michaelbloggs.__
> blogspot.com/2013/05/__importing-ttl-turtle-__ontologies-in-neo4j.html 
> >         <
> http://michaelbloggs.blogspot.com/2013/05/importing-ttl-turtle-ontologies-in-neo4j.html>>
>  
>
> > 
> >             Labels are a bit complicated. You shouldn't /commit /to 
> >         indices 
> > 
> >             during batch imports (but you can add stuff to them) - 
> they'll 
> >             make everything incredibly slow. Michael Hunger suggested 
> >         to use 
> >             MapDB as a temporary index. That's what I'd do in your 
> place. 
> >             Either do it like I did (for small data sets a HashMap is 
> more 
> >             than enough) and use a java.util.Map implementation + index 
> as 
> >             fallback for the nodes that are in the DB, but haven't been 
> >             imported by your application or use a MapDB instead. 
> > 
> >             Regards, 
> >             Michael 
> > 
> >             On Tuesday, 4 June 2013 11:47:25 UTC+2, Jennifer Smith 
> wrote: 
> > 
> >                 Hi there, 
> > 
> >                 I have been looking at the docs for 2.0 particularly 
> >         around 
> >                 support for labels during batch import. 
> > 
> >                 I see there is support for adding labels to nodes 
> >         during batch 
> >                 import, directly querying labels for nodes and so on. 
> >         However, 
> >                 unless I am missing something I don't see that there is 
> >                 support for locating a node by label and ID. I have 
> >         found I 
> >                 have needed to do this when I import a large dataset 
> >         where the 
> >                 relationships come separately from the nodes (say a 
> >         dump from 
> >                 a relational database) and I need to use an external ID 
> to 
> >                 find the nodes for the relationship. 
> > 
> >                  I wondered what the intended approach for looking up 
> >         a node 
> >                 by label and ID is during batch import. I can see the 
> >                 following choices: 
> > 
> >                 - Use the standard EmbeddedGraphDatabase (making sure 
> >         to have 
> >                 shut down the batch inserter of course) to look up the 
> >         nodes 
> >                 for a bunch of relationship inserts before going into 
> >         insert mode. 
> >                 - Use the BatchInserterIndexProvider to somehow hack 
> >         into the 
> >                 underlying index that I believe is created for labels 
> >                 - Be patient and wait for support to appear in the 
> >         batch API 
> >                 for querying nodes by label and ID :) 
> > 
> >                 Thanks 
> > 
> >                 Jen 
> > 
> >         -- 
> >         You received this message because you are subscribed to a 
> >         topic in the 
> >         Google Groups "Neo4j" group. 
> >         To unsubscribe from this topic, visit 
> >         
> https://groups.google.com/d/__topic/neo4j/eq_2fD2BlQU/__unsubscribe?hl=en 
> >         <
> https://groups.google.com/d/topic/neo4j/eq_2fD2BlQU/unsubscribe?hl=en>. 
> >         To unsubscribe from this group and all its topics, send an 
> >         email to 
> > 
> >         neo4j+unsubscribe@__googlegroups.com 
> >         <mailto:neo4j%[email protected] <javascript:>>. 
> >         For more options, visit 
> >         https://groups.google.com/__groups/opt_out 
> >         <https://groups.google.com/groups/opt_out>. 
> > 
> > 
> > 
> > 
> >     -- 
> >     You received this message because you are subscribed to the Google 
> >     Groups "Neo4j" group. 
> >     To unsubscribe from this group and stop receiving emails from it, 
> >     send an email to neo4j+unsubscribe@__googlegroups.com 
> >     <mailto:neo4j%[email protected] <javascript:>>. 
> >     For more options, visit https://groups.google.com/__groups/opt_out 
> >     <https://groups.google.com/groups/opt_out>. 
> > 
> > 
> > 
> > -- 
> > You received this message because you are subscribed to a topic in the 
> > Google Groups "Neo4j" group. 
> > To unsubscribe from this topic, visit 
> > https://groups.google.com/d/topic/neo4j/eq_2fD2BlQU/unsubscribe?hl=en. 
> > To unsubscribe from this group and all its topics, send an email to 
> > [email protected] <javascript:>. 
> > For more options, visit https://groups.google.com/groups/opt_out. 
> > 
> > 
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Re: Approach for using labels during batch import

Reply via email to