[Neo4j] New lucene index on existing graph with custom analyzer.

Aris Fergadis Wed, 25 Feb 2015 07:46:02 -0800

I try to create a new lucene index on an existing graph. The code works 
well on a small graph but when I use a larger one, the process slows down 
to a rate that seems it will never end. 
The graph I am trying to index has about 812000 nodes 2.5M properties and 
3.5M relationships. For the first 400000 nodes (half of total) the rate is 
quite high but after that it gets much slower.


The code I am using reads the graph once to get the ids of the nodes and 
relations. Then, using those ids inserts them to the new indexes. I don't 
use node_auto_index because I want full text and custom analyzer.

public class BatchInsertLuceneIndexes {
    private List<Long> nodes;
    private List<Long> rels;

    public BatchInsertLuceneIndexes() {
        nodes = new ArrayList<>();
        rels = new ArrayList<>();
        getNodesAndRelations();
        buildIndexes();
    }

    public static void main(String[] args) {
        new BatchInsertLuceneIndexes();
    }

    private void getNodesAndRelations() {
        GraphDatabaseService graph = new 
GraphDatabaseFactory().newEmbeddedDatabase("graph.db");
        try (Transaction tx = graph.beginTx()) {
            for (Node n : GlobalGraphOperations.at(graph).getAllNodes()) {
                nodes.add(n.getId());
            }
            for (Relationship r : 
GlobalGraphOperations.at(graph).getAllRelationships()) {
                rels.add(r.getId());
            }
            tx.success();
        }
        graph.shutdown();
    }

    private void buildIndexes() {
        Map<String, String> config = new HashMap<>();
       config.put("cache_type", "none");
        config.put("use_memory_mapped_buffers", "true");
        config.put("neostore.nodestore.db.mapped_memory", "4000M");
        config.put("neostore.relationshipstore.db.mapped_memory", "4000M");
        config.put("neostore.propertystore.db.mapped_memory", "1000M");
        config.put("neostore.propertystore.db.strings.mapped_memory", "1000M");
        BatchInserter graph = BatchInserters.inserter("graph.db",config);
        LuceneBatchInserterIndexProvider indexProvider = new 
LuceneBatchInserterIndexProvider(graph);
        BatchInserterIndex nodeIndex = indexProvider.nodeIndex("node_index", 
MapUtil.stringMap(
                "type", "fulltext",
                "to-lower-case", "true",
                "analyzer", "org.neo4j.contrib.fti.analyzers.English"));
        BatchInserterIndex relIndex = 
indexProvider.relationshipIndex("rel_index", MapUtil.stringMap(
                "type", "fulltext",
                "to-lower-case", "true",
                "analyzer", "org.neo4j.contrib.fti.analyzers.English"));

        int counter = 0;
        for (Long n : nodes) {
            Map<String, Object> nodeProperties = graph.getNodeProperties(n);
            nodeIndex.add(n, nodeProperties);
            counter++;
            if (counter % 10000 == 0) {
                nodeIndex.flush();
                System.out.println(counter);
            }
        }
        counter = 0;
        for (Long r: rels) {
            Map<String, Object> relationshipProperties = 
graph.getRelationshipProperties(r);
            relIndex.add(r, relationshipProperties);
            if (counter % 10000 == 0) {
                relIndex.flush();
                System.out.println(counter);
            }
        }
        nodeIndex.flush();
        relIndex.flush();
        indexProvider.shutdown();
        graph.shutdown();
    }
}


I also use the parameter *-XX:+UseConcMarkSweepGC* when executing the code.
My machine has 4GB memory and I am using Neo4j version 2.1.6

Is the config parameters that I have to tune or something else?

Thanks in advance.

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[Neo4j] New lucene index on existing graph with custom analyzer.

Reply via email to