Hi, At different random times throughout the day I am going to do a "crawl" of data which I am going to feed into elasticsearch. This bit is working just fine.
However the index should reflect only what was found in my most recent crawl and I currently have nothing to remove the content in the elasticsearch index which was left over from the previous crawl but wasn't found in the new crawl. >From what I can see I have a few options: A) Delete items based on how old they are. Won't work because index times are random. B) Delete entire index and feed with fresh data. Doesn't else em very efficient and will leave me time with an empty or partial index. C) Do an insert/modify query, if not found insert, if found already in the index update the timesstamp, then do a second pass to delete any items with an older time stamp. D) Something better. I would really appreciate any feedback on a logical and efficient way to removing old content in a situation like this. Thank you and happy Easter. James -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/AF36C2A5-8B38-4176-90B8-2E4210A0244F%40employ.com. For more options, visit https://groups.google.com/d/optout.
