It was a GC issue. After taking a closer look with a debugger I've observed
that the purge operation was far from reaching BATCH_SIZE before hitting
the modulo condition when it started aggressive GC under heap limit.
I've refactored a code a bit (had to reduce BATCH_SIZE to 10 to actually
make it run with reasonable memory usage).
public class Purger {
private static final int BATCH_SIZE = 10;
public static void purgeDictionary(GraphDatabaseService graphDb, int
dictionary) {
for (String label : new String[]{"station", "arrival", "departure",
"change", "dictionary", "estimate-station"}) {
List<Long> nodeIds = new ArrayList<>();
try (Transaction tx = graphDb.beginTx()) {
ResourceIterable<Node> nodes =
graphDb.findNodesByLabelAndProperty(DynamicLabel.label(label), "dictionary-id",
dictionary);
nodes.forEach(node -> nodeIds.add(node.getId()));
tx.success();
}
Transaction tx = graphDb.beginTx();
int count = 0;
try {
for (Long nodeId : nodeIds) {
Node node = graphDb.getNodeById(nodeId);
for (Relationship rel : node.getRelationships()) {
rel.delete();
}
node.delete();
if (++count % BATCH_SIZE == 0) {
tx.success();
tx.close();
tx = graphDb.beginTx();
}
}
tx.success();
} finally {
tx.close();
}
}
}
...
}
It's running slow but it completes. However, I've observed a couple of
issues. First, during the deletion operation, the queries (for example
about the overall count of nodes - like `match (n) return count(n)`) run
for a very long period of time - almost a minute in my case. After the
deletion, the "slowness" of the given query remains on the same level -
even after a restart. Second, the database size on disk almost doubles
during the deletion (when there's only one "dictionary" imported).
The "slowness" problem - during the deletion as well as after it - goes
away after switching to 2.2.0-RC01.
However the size of database on disk does not get reduced.
>From what I've read, it seems that the space after deleted node/rels is
reused by new nodes/rels but isn't a subject to any kind of shrinking
operation. The shrinking itself doesn't seem to be possible without (a)
bringing down the db (b) using a third party tool like store-utils. Are all
of these true? If they are indeed, I may have to adjust my strategy of
reducing the size of the db and accommodate for some scheduled downtime
and/or maybe even reimporting relevant data to a fresh instance of the db.
Regards,
Adrian
W dniu piątek, 13 marca 2015 07:20:57 UTC+1 użytkownik Michael Hunger
napisał:
>
> Do you actually run into GC issues, or is it just the Neo4j cache that's
> filled?
>
> Would it be possible to run this (with 512M or 1G heap) and create a
> heap-dump and look at the occupation?
>
> Thx Michael
>
> Am 12.03.2015 um 17:41 schrieb Adrian Gruntkowski <[email protected]
> <javascript:>>:
>
> Hello,
>
> Hi. I'm having problems deleting nodes and relationships from a
> server-side plugin.
>
> I have a following code that is supposed to accomplish this:
>
>
> 1. public class Purger {
> 2. private static final int BATCH_SIZE = 1000;
> 3.
> 4. ...
> 5.
> 6. public static void purgeDictionary(GraphDatabaseService graphDb,
> int dictionary) {
> 7. Transaction tx = graphDb.beginTx();
> 8. int count = 0;
> 9. try {
> 10. for (String label : new String[]{"station", "arrival",
> "departure", "change", "dictionary", "estimate-station"}) {
> 11. ResourceIterable<Node> nodes = graphDb.
> findNodesByLabelAndProperty(DynamicLabel.label(label), "dictionary-id",
> dictionary);
> 12.
> 13. for (Node node : nodes) {
> 14. for (Relationship rel : node.getRelationships())
> {
> 15. rel.delete();
> 16. }
> 17.
> 18. node.delete();
> 19.
> 20. if (++count % BATCH_SIZE == 0) {
> 21. tx.success();
> 22. tx.close();
> 23. tx = graphDb.beginTx();
> 24. }
> 25. }
> 26.
> 27. }
> 28. tx.success();
> 29. } finally {
> 30. tx.close();
> 31. }
> 32. }
> 33.
> 34. ...
> 35. }
>
>
>
>
> The problem is, the operation quickly fills up the heap, no matter if it's
> 2GB or 8GB. The graph has about 1.7M nodes. What am I doing wrong here?
>
> I'm currently running neo4j 2.1.7 under JDK8 under 64bit Linux Debian
> derivative.
>
> Regards,
> Adrian
>
> --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected] <javascript:>.
> For more options, visit https://groups.google.com/d/optout.
>
>
>
--
You received this message because you are subscribed to the Google Groups
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.