On 17 December 2013 19:32, Michael Hunger <[email protected] > wrote:
> The batch-inserter slowdown is a regression in 2.0.0 and currently worked > on. > > How often do you have to run the import? > > I have to import the dbpedia which has 20-30 files, but I am not able to complete the same because of following reasons: 1. Some files are malformed. This https://github.com/oleiade/dbpedia4neo/blob/master/cleanup.sh helps to cleanup most of the files, but doesn't handle every case. Due to this, I am not able to use batchinsert on some of the files. Figuring out which files aren't well formed is another lengthy job (basically importing on an empty graph). 2. Some files ends up getting stuck at db.shutdown() command for hours. I tried splitting them into further sub parts, still that takes a lot of time. Currently, I am using indexCache of 500,000 entries and a timeout of 60 seconds. Few possible solutions : 1. If there is a possibility of merging two databases, then I can create database out of each file and then merge them. 2. If I can run the solution by Oleiade on the graph generated by batch inserter, that may help too. As I can interchange methods at my convenience. On doing so it says - "Store version [NeoStore v0.A.1]. Please make sure you are not running old Neo4j kernel on a store that has been created by newer version of Neo4j. I tried recompiling with the latest version but it didn't work either. 3. If I can run multiple instances of Oleiade on the same graph without significantly affecting the speed of an individual process/thread. Thanks Abhishek Michael > > Am 17.12.2013 um 14:49 schrieb Abhishek Gupta <[email protected]>: > > > On 16 December 2013 06:21, Michael Hunger < > [email protected]> wrote: > >> Can you show the output from the import run? >> > > I used the code available here - > https://github.com/mybyte/tools/tree/master/Turtle%20loader. I run > BatchExecutable which uses Neo4jDBBatchHandler as the handler. I add > parameter -Xmx3200m to the jvm arguments. It takes about a minute and half > to add three million triples in an empty database at 30000 triples per > second, but takes more than 5 minutes to execute db.shutdown(). This > shutdown becomes the bottleneck when I am not using an empty graph. > > The other approach as I found was available on > https://github.com/oleiade/dbpedia4neo and BatchInserter, which is much > slower as it doesn't use BatchInserter. > > > -- > You received this message because you are subscribed to the Google Groups > "Neo4j" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. > > > -- > You received this message because you are subscribed to a topic in the > Google Groups "Neo4j" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/neo4j/y7amc5GewrM/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. > -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
