Hey guys,

I am using the script below to reindex 115,000 documents. (I am running the 
script locally)

<?php
// PHP ReIndexer with Bulk API
require 'vendor/autoload.php';


// we use this function to create the "scan & scroll" search requests 
because such requests doesn't exist in the ES PHP API.
function curlWrapper($uri, $method, $data = '')
{
        $ch = curl_init($uri);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
        curl_setopt($ch, CURLOPT_CUSTOMREQUEST, $method);
        curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_0);


        if ($data != '')
                curl_setopt($ch, CURLOPT_POSTFIELDS, $data); 


        $response = curl_exec($ch);
        curl_close($ch);


        return $response;
}


error_reporting(E_ALL); 
ini_set( 'display_errors','1');
date_default_timezone_set("UTC");


$ELSEARCH_SERVER = "http://someserver:9200/";;
$OLDINDEX = "OldIndex"; //old index
$SECONDINDEX = "NewIndex"; // new index
$TYPE = 'MyType'; // old type
$LOGPATH = '/var/log/elasticsearch/elasticsearch.log';


$clientParams = array();
$clientParams['logging']  = true;
$clientParams['logPath']  = $LOGPATH;
$clientParams['logLevel'] = Psr\Log\LogLevel::INFO;
$clientParams['hosts'] = array ($ELSEARCH_SERVER);
$dstEl = new Elasticsearch\Client($clientParams);


//start the scan request
//We want to find all documents, so we do a simple match_all
$query ='{"query" : {"match_all" : {}}}';


//The scroll=10m param says that this scroll session should be valid for 10 
minutes before expiring
//The size=100 param says that 100 results should be returned per scroll
$uri = $ELSEARCH_SERVER.$OLDINDEX."/".$TYPE.
"/_search?search_type=scan&scroll=10m&size=100";
$response = curlWrapper($uri, 'GET', $query);
$data =  json_decode($response);


//total number of documents in the index
$total = $data->hits->total;


//scroll session id, used to request the next batch of data
$scroll_id = $data->_scroll_id;


//The scan request doesn't actually return any data, just a session "scroll 
id"
//We now query ES and provide this id to start retrieving the data
$uri = $ELSEARCH_SERVER."_search/scroll?scroll=10m";
$response = curlWrapper($uri, 'GET', $scroll_id);
$data =  json_decode($response);


// Initialize bulk insertion parameters.
$bulkInsertParams = array();
$bulkInsertParams['index'] = $SECONDINDEX;
$bulkInsertParams['type'] = $TYPE;


echo date("Y-m-d H:i:s") . ": Start ReIndexing." . PHP_EOL;


//Loop through all the data
while (count($data->hits->hits) > 0)
{
 $bulkInsertParams["body"]=null;
 
 foreach ($data->hits->hits as $item) // run for each match of the 
"scan&scroll search".
 {
 $bulkInsertParams["body"][] = array(
 'index' => array(
 '_id' => $item->_id
 )
 );
 $bulkInsertParams["body"][] = array(
 'doc' => $item->_source
 );
 }
 $retVal = $dstEl->bulk($bulkInsertParams);
 
 //Each scroll request returns another scroll_id which is used to continue
 //scrolling through the data
 $scroll_id = $data->_scroll_id;


 //retrieve the next batch of data - the new session is good for an 
additional 10m, etc etc
 $uri = $ELSEARCH_SERVER."_search/scroll?scroll=10m";
 $response = curlWrapper($uri, 'GET', $scroll_id);
 $data =  json_decode($response);
}


echo date("Y-m-d H:i:s") . ": DONE!" . PHP_EOL;
?>

every thing seems to work fine and even when i use this query:

GET NewIndex/MyType/_search
{
  "size":0
}

I get these results (Which looks good)

{
   "took": 2,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 115102,
      "max_score": 0,
      "hits": []
   }
}

But when i am trying to make a query on the documents' field i get no 
results while when i run the exact same query on the old index i get the 
expected results..

This is the query (if it helps):

GET NewIndex/MyType/_search
{
  "query": {
    "terms": {
      "doc_type": [
        "user_view"
      ]
    }
  }
}

the results are:

{
   "took": 3,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 0,
      "max_score": 0,
      "hits": []
   }
}

while the results for the OldIndex are:

{
   "took": 3,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 104452,
      "max_score": 0,
      "hits": []
   }
}


I am wondering if there is something else that i should do to make the 
documents get indexed in the elasticsearch?


Note: 
(*) when I try to get specific document (by key) from NewIndex the results 
is fine..

Thnx for you help
Niv :)

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/84795182-eab5-4b74-a8ef-d1bcdb989659%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to