Hi all,

I'm writing a web crawler in Node.js and indexing with ElasticSearch. 
However, I've ran into a problem where the code hangs at the indexing 
function.

Here's how the client is initialised:

    var es_client = new elasticsearch.Client({
        host: "localhost:9200",
        log: ['error', 'trace'],
        keepAlive: true,
        sniffOnConnectionFault: true,
        //sniffInterval: 6000,
        sniffOnStart: true,
        maxKeepAliveTime: 600000
    });

And here's the indexing API call:

    es_client.index({
      index: seedURL,
      type: 'post',
      id: generate_md5(username + "\n" + post_title + "\n" + post_content),
      body: {
        thread_md5 : thread_md5,
        thread_title : thread_title,
        thread_url : post_list_page_url,
        post_title: post_title,
        post_order : post_order,
        post_content: post_content,
        timestamp: timestamp,
        username: username,
        
      }
    }).then(
        function (resp) {
            console.log("Elasticsearch response to indexing " + post_title 
+ "...");
            console.log(resp);
        }, 
        function (err) {
            console.log("[ERROR] An error occurred whilst indexing: " + 
post_title + "...");
            console.log(err.message);
        }
    );

I have been testing the crawler script by commenting out the call to 
indexing and it finishes the crawl no problem. This showed that the problem 
somehow lies with ElasticSearch.

I have also had a look at the ElasticSearch logs and no errors were raised.

Lastly - and this could be the best hint yet - is that the number of 
documents successfully indexed at every trial run hangs at *exactly* 277 
documents. 

Thoughts?

Cheers,

James



-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/47d54f06-1ed8-4170-a019-31e88009fb06%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to