Delete duplicate items

Jingzhao Ou Sun, 28 Sep 2014 18:08:19 -0700

Hi, all, 

I use Elastic Search to store some JSON data like the following:


{
  "_index" : "normalized",
  "_type" : "90A2DAFB0621",
  "_id" : "Fri Sep 12 16:59:50 UTC 2014",
  "_score" : 1.0,
  
"_source":{"id":"2014-09-12T16:59:50.000Z","r":72.16,"o":74.3,"m":78.01,"s":66.99,"c":0.03,"p":2.77,"e":7.694444444444444E-6,"ec":1.8466666666666666E-6,"mo":0,"ot":64.31,"ecop":91}
}


I changed how "_id" is calculated in my program later on. Then, in the old data 
sets, there are two duplicated items for older data. I was able to find the 
duplicated items using the aggregation API:


{
    "aggs": {
        "types": {
            "terms": {
                "field": "_type"
            },
            "aggs": {
                "dups": {
                    "histogram": {
                        "field": "id",
                        "interval": 1,
                        "min_doc_count": 2
                    }
                }
            }
        }
    }
}


I can remove the old data one by one using the delete API. But I wonder if 
there are any better solutions.


Thanks a lot for your help! 

Jingzhao



-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2d84de9f-5317-45ff-b599-2cae7f505b3a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Delete duplicate items

Reply via email to