I try to create a new lucene index on an existing graph. The code works
well on a small graph but when I use a larger one, the process slows down
to a rate that seems it will never end.
The graph I am trying to index has about 812000 nodes 2.5M properties and
3.5M relationships. For the first 400000 nodes (half of total) the rate is
quite high but after that it gets much slower.
The code I am using reads the graph once to get the ids of the nodes and
relations. Then, using those ids inserts them to the new indexes. I don't
use node_auto_index because I want full text and custom analyzer.
public class BatchInsertLuceneIndexes {
private List<Long> nodes;
private List<Long> rels;
public BatchInsertLuceneIndexes() {
nodes = new ArrayList<>();
rels = new ArrayList<>();
getNodesAndRelations();
buildIndexes();
}
public static void main(String[] args) {
new BatchInsertLuceneIndexes();
}
private void getNodesAndRelations() {
GraphDatabaseService graph = new
GraphDatabaseFactory().newEmbeddedDatabase("graph.db");
try (Transaction tx = graph.beginTx()) {
for (Node n : GlobalGraphOperations.at(graph).getAllNodes()) {
nodes.add(n.getId());
}
for (Relationship r :
GlobalGraphOperations.at(graph).getAllRelationships()) {
rels.add(r.getId());
}
tx.success();
}
graph.shutdown();
}
private void buildIndexes() {
Map<String, String> config = new HashMap<>();
config.put("cache_type", "none");
config.put("use_memory_mapped_buffers", "true");
config.put("neostore.nodestore.db.mapped_memory", "4000M");
config.put("neostore.relationshipstore.db.mapped_memory", "4000M");
config.put("neostore.propertystore.db.mapped_memory", "1000M");
config.put("neostore.propertystore.db.strings.mapped_memory", "1000M");
BatchInserter graph = BatchInserters.inserter("graph.db",config);
LuceneBatchInserterIndexProvider indexProvider = new
LuceneBatchInserterIndexProvider(graph);
BatchInserterIndex nodeIndex = indexProvider.nodeIndex("node_index",
MapUtil.stringMap(
"type", "fulltext",
"to-lower-case", "true",
"analyzer", "org.neo4j.contrib.fti.analyzers.English"));
BatchInserterIndex relIndex =
indexProvider.relationshipIndex("rel_index", MapUtil.stringMap(
"type", "fulltext",
"to-lower-case", "true",
"analyzer", "org.neo4j.contrib.fti.analyzers.English"));
int counter = 0;
for (Long n : nodes) {
Map<String, Object> nodeProperties = graph.getNodeProperties(n);
nodeIndex.add(n, nodeProperties);
counter++;
if (counter % 10000 == 0) {
nodeIndex.flush();
System.out.println(counter);
}
}
counter = 0;
for (Long r: rels) {
Map<String, Object> relationshipProperties =
graph.getRelationshipProperties(r);
relIndex.add(r, relationshipProperties);
if (counter % 10000 == 0) {
relIndex.flush();
System.out.println(counter);
}
}
nodeIndex.flush();
relIndex.flush();
indexProvider.shutdown();
graph.shutdown();
}
}
I also use the parameter *-XX:+UseConcMarkSweepGC* when executing the code.
My machine has 4GB memory and I am using Neo4j version 2.1.6
Is the config parameters that I have to tune or something else?
Thanks in advance.
--
You received this message because you are subscribed to the Google Groups
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.