Re: [Neo4j] Bulk deletion of nodes and relationships in a server-side plugin fills up heap

Adrian Gruntkowski Fri, 13 Mar 2015 16:09:07 -0700

It was a GC issue. After taking a closer look with a debugger I've observed 
that the purge operation was far from reaching BATCH_SIZE before hitting 
the modulo condition when it started aggressive GC under heap limit.


I've refactored a code a bit (had to reduce BATCH_SIZE to 10 to actually 
make it run with reasonable memory usage). 

public class Purger {
    private static final int BATCH_SIZE = 10;

    public static void purgeDictionary(GraphDatabaseService graphDb, int 
dictionary) {
        for (String label : new String[]{"station", "arrival", "departure", 
"change", "dictionary", "estimate-station"}) {
            List<Long> nodeIds = new ArrayList<>();
            try (Transaction tx = graphDb.beginTx()) {
                ResourceIterable<Node> nodes = 
graphDb.findNodesByLabelAndProperty(DynamicLabel.label(label), "dictionary-id", 
dictionary);
                nodes.forEach(node -> nodeIds.add(node.getId()));
                tx.success();
            }

            Transaction tx = graphDb.beginTx();
            int count = 0;
            try {
                for (Long nodeId : nodeIds) {
                    Node node = graphDb.getNodeById(nodeId);
                    for (Relationship rel : node.getRelationships()) {
                        rel.delete();
                    }

                    node.delete();

                    if (++count % BATCH_SIZE == 0) {
                        tx.success();
                        tx.close();
                        tx = graphDb.beginTx();
                    }
                }
                tx.success();
            } finally {
                tx.close();
            }
        }

    }

    ...
}


It's running slow but it completes. However, I've observed a couple of 
issues. First, during the deletion operation, the queries (for example 
about the overall count of nodes - like `match (n) return count(n)`) run 
for a very long period of time - almost a minute in my case. After the 
deletion, the "slowness" of the given query remains on the same level - 
even after a restart. Second, the database size on disk almost doubles 
during the deletion (when there's only one "dictionary" imported).

The "slowness" problem - during the deletion as well as after it - goes 
away after switching to 2.2.0-RC01.

However the size of database on disk does not get reduced.

>From what I've read, it seems that the space after deleted node/rels is 
reused by new nodes/rels but isn't a subject to any kind of shrinking 
operation. The shrinking itself doesn't seem to be possible without (a) 
bringing down the db (b) using a third party tool like store-utils. Are all 
of these true? If they are indeed, I may have to adjust my strategy of 
reducing the size of the db and accommodate for some scheduled downtime 
and/or maybe even reimporting relevant data to a fresh instance of the db.

Regards,
Adrian

W dniu piątek, 13 marca 2015 07:20:57 UTC+1 użytkownik Michael Hunger 
napisał:
>
> Do you actually run into GC issues, or is it just the Neo4j cache that's 
> filled?
>
> Would it be possible to run this (with 512M or 1G heap) and create a 
> heap-dump and look at the occupation?
>
> Thx Michael
>
> Am 12.03.2015 um 17:41 schrieb Adrian Gruntkowski <[email protected] 
> <javascript:>>:
>
> Hello,
>
> Hi. I'm having problems deleting nodes and relationships from a 
> server-side plugin. 
>
> I have a following code that is supposed to accomplish this: 
>
>
>    1. public class Purger {
>    2.    private static final int BATCH_SIZE = 1000;
>    3. 
>    4.     ...
>    5. 
>    6.     public static void purgeDictionary(GraphDatabaseService graphDb, 
>    int dictionary) {
>    7.        Transaction tx = graphDb.beginTx();
>    8.        int count = 0;
>    9.        try {
>    10.            for (String label : new String[]{"station", "arrival", 
>    "departure", "change", "dictionary", "estimate-station"}) {
>    11.                ResourceIterable<Node> nodes = graphDb.
>    findNodesByLabelAndProperty(DynamicLabel.label(label), "dictionary-id", 
>    dictionary);
>    12. 
>    13.                 for (Node node : nodes) {
>    14.                    for (Relationship rel : node.getRelationships()) 
>    {
>    15.                        rel.delete();
>    16.                    }
>    17. 
>    18.                     node.delete();
>    19. 
>    20.                     if (++count % BATCH_SIZE == 0) {
>    21.                        tx.success();
>    22.                        tx.close();
>    23.                        tx = graphDb.beginTx();
>    24.                    }
>    25.                }
>    26. 
>    27.             }
>    28.            tx.success();
>    29.        } finally {
>    30.            tx.close();
>    31.        }
>    32.    }
>    33. 
>    34.     ...
>    35. }
>    
>
>
>
> The problem is, the operation quickly fills up the heap, no matter if it's 
> 2GB or 8GB. The graph has about 1.7M nodes. What am I doing wrong here? 
>
> I'm currently running neo4j 2.1.7 under JDK8 under 64bit Linux Debian 
> derivative.
>
> Regards,
> Adrian
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <javascript:>.
> For more options, visit https://groups.google.com/d/optout.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Bulk deletion of nodes and relationships in a server-side plugin fills up heap

Reply via email to