On 17 December 2013 19:32, Michael Hunger <[email protected]
> wrote:

> The batch-inserter slowdown is a regression in 2.0.0 and currently worked
> on.
>
> How often do you have to run the import?
>
> I have to import the dbpedia which has 20-30 files, but I am not able to
complete the same because of following reasons:

1. Some files are malformed. This
https://github.com/oleiade/dbpedia4neo/blob/master/cleanup.sh helps to
cleanup most of the files, but doesn't handle every case. Due to this, I am
not able to use batchinsert on some of the files. Figuring out which files
aren't well formed is another lengthy job (basically importing on an empty
graph).

2. Some files ends up getting stuck at db.shutdown() command for hours. I
tried splitting them into further sub parts, still that takes a lot of
time. Currently, I am using indexCache of 500,000 entries and a timeout of
60 seconds.

Few possible solutions :

1. If there is a possibility of merging two databases, then I can create
database out of each file and then merge them.

2. If I can run the solution by Oleiade on the graph generated by batch
inserter, that may help too. As I can interchange methods at my
convenience. On doing so it says - "Store version [NeoStore v0.A.1]. Please
make sure you are not running old Neo4j kernel on a store that has been
created by newer version of Neo4j. I tried recompiling with the latest
version but it didn't work either.

3. If I can run multiple instances of Oleiade on the same graph without
significantly affecting the speed of an individual process/thread.

Thanks
Abhishek


Michael
>
> Am 17.12.2013 um 14:49 schrieb Abhishek Gupta <[email protected]>:
>
>
> On 16 December 2013 06:21, Michael Hunger <
> [email protected]> wrote:
>
>> Can you show the output from the import run?
>>
>
> I used the code available here -
> https://github.com/mybyte/tools/tree/master/Turtle%20loader. I run
> BatchExecutable which uses Neo4jDBBatchHandler as the handler. I add
> parameter -Xmx3200m to the jvm arguments. It takes about a minute and half
> to add three million triples in an empty database at 30000 triples per
> second, but takes more than 5 minutes to execute db.shutdown(). This
> shutdown becomes the bottleneck when I am not using an empty graph.
>
> The other approach as I found was available on
> https://github.com/oleiade/dbpedia4neo and BatchInserter, which is much
> slower as it doesn't use BatchInserter.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "Neo4j" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/neo4j/y7amc5GewrM/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to