Jason, you might also try some of the tuning options described here http://structr.org/blog/neo4j-performance-on-ext4.

Best
Axel

Am 29.12.2013 10:55, schrieb Jason W:
Just realized I didn't have the index set on the right property. Doh!

After adding the index, I was able to insert a batch of 1,000 in 3.2 seconds which feels much better. When trying a larger batch though, the performance does not scale linearly - 25,000 batch took almost 15 minutes. I can clearly see disk writes and garbage collection playing a role now, so i'm playing around with batch sizes now. I'm on a linux server with 64GB available memory, 64 cores, and software RAID 10 over 4 x 7,200 RPM disks. I'm using default settings on neo4j.

Any tuning advice would be greatly appreciated!

On Sunday, December 29, 2013 12:58:16 AM UTC-6, Jason W wrote:

    Michael,
    I tried out your tool and I love the ease at which I was able to
    get going. Unfortunately, it hasn't really helped my performance
    issue.

    Here's my command:
    import-cypher -i input.csv -i output.csv MERGE (a:Attribute
    {coordinate: {coordinate}}) WITH a match (u:User {name: 'jason'})
    CREATE UNIQUE (u)-[r:HAS_ATTRIBUTE]->(a)

    input.csv looks like this:
    coordinate
    1:1
    1:2
    1:3
    etc..

    Running a test with 1000 attributes in input.csv took 230 seconds,
    which is a measley 4.3 inserts per second.




    On Saturday, December 28, 2013 11:29:29 PM UTC-6, Jason W wrote:

        Michael,
        Thanks for the reply. Your tool looks pretty interesting!
        Looks like it allows me to user parameters by providing a CSV
        file of values. I'll give it a try.

        To answer your questions..
        I have created a unique index on :Attribute(coordinate). The
        attributes are simply nodes that need to be connected to the
        user. Different users will share some of these attributes, and
        I need to be able to query which ones are shared (or not
        shared) between various users. I was running by piping the
        cyper queries to just "neo4j-shell" with a running server.
        Should I be using the "-file" option?

        On Saturday, December 28, 2013 6:51:37 PM UTC-6, Michael
        Hunger wrote:

            Jason,

            usually you would use parameters to speed it up. The shell
            also supports parameters, you can use "export param=value"

            e.g.
            export key="#{key}"
            export user="#{user}"
            MERGE (a:Attribute {coordinate: {key}}) WITH a
            MATCH (u:User {name: {user}}) CREATE UNIQUE
            (u)-[r:HAS_ATTR]->(a);

            did you create a unique index for your merge command ? (or
            at least a normal index on :Attribute(coordinate)

            What are the attributes for?

            Also combining around  20-50k elements in a single tx
            would speed it up.

            begin

            export key="#{key}"
            export user="#{user}"
            MERGE (a:Attribute {coordinate: {key}}) WITH a
            MATCH (u:User {name: {user}}) CREATE UNIQUE
            (u)-[r:HAS_ATTR]->(a);
            ...
            ...
            ...
            commit

            did you try to run bin/neo4j-shell -file file

            are your running against a running server? or the shell
            with -path ?
            You probably want to do the former, so that it can use the
            memory config of the running server.
            Otherwise it might make sense to configure the neo4j-shell
            script (if you edit it there is a line like this, add some
            sensible memory config to it):
            EXTRA_JVM_ARGUMENTS="-Xmx8G -Xms8G -Xmn1G"


            For fast imports of csv files with a single cypher
            statement like yours perhaps my neo4j-shell import tools
            would be helpful :)
            Check it out here:
            https://github.com/jexp/neo4j-shell-tools/tree/20#cypher-import
            <https://github.com/jexp/neo4j-shell-tools/tree/20#cypher-import>


            HTH

            Michael

            Am 29.12.2013 um 00:30 schrieb Jason W <[email protected]>:

            Hi Everyone,
            I'm relatively new to neo4j and i'm running into slowness
            when trying to insert a batch of data. My strategy has
            been to write the batch of Cypher queries to a text file,
            and then pipe that into the neo4j-shell. Here is a
            description of my data.

            Start with a single "user" node.
            Create (if not exists) many "attribute" nodes.
            Create relationships between "user" node and "attribute"
            nodes.

            In my benchmarking, I'm creating 10,000 attribute nodes
            and relationships from the user to the attributes. The
            caveat is that the attribute nodes may already exist, and
            if it does I want to use the existing one instead of
            creating a new one. My current approach uses the MERGE
            command to create the attribute nodes (or return the node
            if it doesn't exist). My Cypher queries look something
            like this:

            MERGE (a:Attribute {coordinate: '#{key}'}) WITH a
            MATCH (u:User {name: '#{user}'}) CREATE UNIQUE
            (u)-[r:HAS_ATTR]->(a)

            Running 10,000 sequential queries like this to insert my
            data is quite slow. I'm getting somewhere around 20
            inserts per second. Here are some things I've tried to
            optimize:

            -Batch these into a large transaction in a text file, and
            pipe it into the neo4j-shell
            -Batch these into a large single command in a text file,
            and pipe into neo4j-shell
            -Break into parallel jobs and insert multi threaded. Each
            query must be a single transaction otherwise it locks.
            -Separate the MERGE commands into a batch, and the CREATE
            relationship commands into a separate batch

            I've done the tooling benchmark to test file system
            performance
            
(http://docs.neo4j.org/chunked/milestone/linux-performance-guide.html
            
<http://docs.neo4j.org/chunked/milestone/linux-performance-guide.html>)
            and my results are great. I should be able to get upwards
            of 70k records/sec based on the benchmark.

            Can anyone advise what is the best strategy to import
            this type of data quickly?




            This message contains confidential information and is
            intended only for the individual named. If you are not
            the named addressee you should not disseminate,
            distribute or copy this e-mail. Please notify the sender
            immediately by e-mail if you have received this e-mail by
            mistake and delete this e-mail from your system. If you
            are not the intended recipient you are notified that
            disclosing, copying, distributing or taking any action in
            reliance on the contents of this information is strictly
            prohibited.

-- You received this message because you are subscribed to
            the Google Groups "Neo4j" group.
            To unsubscribe from this group and stop receiving emails
            from it, send an email to [email protected].
            For more options, visit
            https://groups.google.com/groups/opt_out
            <https://groups.google.com/groups/opt_out>.




        This message contains confidential information and is intended
        only for the individual named. If you are not the named
        addressee you should not disseminate, distribute or copy this
        e-mail. Please notify the sender immediately by e-mail if you
        have received this e-mail by mistake and delete this e-mail
        from your system. If you are not the intended recipient you
        are notified that disclosing, copying, distributing or taking
        any action in reliance on the contents of this information is
        strictly prohibited.




    This message contains confidential information and is intended
    only for the individual named. If you are not the named addressee
    you should not disseminate, distribute or copy this e-mail. Please
    notify the sender immediately by e-mail if you have received this
    e-mail by mistake and delete this e-mail from your system. If you
    are not the intended recipient you are notified that disclosing,
    copying, distributing or taking any action in reliance on the
    contents of this information is strictly prohibited.




This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited. -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.


--

Axel Morgner
CEO Structr (c/o Morgner UG) · Hanauer Landstr. 291a · 60314 Frankfurt · Germany
Twitter: @amorgner <https://twitter.com/amorgner>
Phone: +49 151 40522060
Skype: axel.morgner

Structr <http://structr.org> - Award-Winning Open Source CMS and Web Framework based on Neo4j Structr Mailing List and Forum <https://groups.google.com/forum/#%21forum/structr> Graph Database Usergroup "graphdb-frankfurt" <http://www.meetup.com/graphdb-frankfurt>

--
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to