Re: Migration from Solr to ElasticSearch

Diego Marchi Tue, 03 Jun 2014 10:24:32 -0700

Thank you Jorg,

I'll start from the second question: Thanks! My problem was that I didn't 
know about the _shutdown option so I was simply killing the process 
therefore forcing the system to recover the indices.


As far as the migration from solr to elasticsearch is concerned, I 
basically want the indexed/analyzed but unstored field to be transferred 
from solr to ES, so I can perform a full-text search on it. 
So are there tools allowing me to copy the lucene indexes over to 
elasticsearch and allow me to have the same functionality?

To retrieve the actual document, I'll simply take the id and retrieve the 
document from the storage. This is how the system was built before and how 
I have to test it: indexed but unstored fields are kept inside solr, which 
is queried for full-text searches. Actual documents are kept in a separate 
filesystem. The results of the queries are taken and used to retrieve the 
actual documents from this filesystem. 

If we decide to move with ES, then we could change the approach and have 
everything stored inside ES and reindex our full archive.

Thanks for the sharding advice, I realize I cannot use sharding with the 
current configuration. The current system in solr has just 1 collection 
with 1 core and 1 instance.

We are confronting performances between ES and SOLR multicore on 
distributed system (not cloud, but simply having several instances and 
balance the load using a custom algorithm, to have more control on where 
the data goes) and after this we'll decide where we should go.

Thanks

Il giorno martedì 3 giugno 2014 09:55:21 UTC-7, Jörg Prante ha scritto:
>
>  If you have indexed the data in Solr, you should consider a tool that 
> can traverse the Lucene index and reconstruct the documents. This is not a 
> straightforward process, as you know already, because analyzed fields look 
> different than the original input. The reconstruction may not recover the 
> original input, but could be used for input into Elasticsearch, when 
> transformed to JSON. It heavily depends on the Solr analyzers you used. 
>
> You know that Elasticsearch index is sharded, so it is obvious you have to 
> reindex the documents in order to take advantage of ES sharding.
>
> What time intervals do you mean to be expected at ES startup? When 
> shutting down ES, you should use the _shutdown endpoint for a clean 
> shutdown. A clean shutdown writes checksums to disk for fast startup. When 
> starting with valid checksums, ES is available within a few seconds and 
> turns to state "green". Otherwise it performs indices recovery. After all 
> shards respond after invalid checksums, and this duration is due to the 
> shard sizes and disk I/O speed, an ES cluster starts usually within 30 
> seconds to 1 minute. It can not do much faster after unclean shutdowns 
> because of the index recovery. The recovery, like index/search depends on 
> the overall power of your ES cluster. There are tunables to increase 
> recovery speed, by suppressing search/index performance at the same time.
>
> Jörg
>
> Am 02.06.14 21:33, schrieb Diego Marchi:
>  
> Hello all, 
>
>  I'm testing the ES environment to see if a migration from Solr could 
> bring benefits to our system. We are considering a complete renovation of 
> our service, taking it from Java to Python plus a lot of new enhancements. 
>
>  Currently we use Solr for indexing purposes. We store webpages from 
> customers and index them using solar. Within a solr document we have a 
> dozen of fields to keep track of the data, the data itself is indexed in 
> Solr in a *content *field which is set (in the schema.xml) to be 
> indexed="true" stored="false". In fact, I can do a text search on it but I 
> cannot retrieve the whole field (obviously..)
>
>  The actual content is saved on our server and it is a massive 22TB of 
> data. You'll understand we cannot reindex the whole thing just for testing 
> purposes. We're considering to use a subset of it but also this is time 
> consuming.
>
>  I was looking if there was any way to transfer the indexed but unstored 
> *content *field directly from solr to elastic search.
>
>  On another topic, when I shut down and turn on again the ES engine, I 
> noticed that the documents are not all available at once, but they take 
> time to load.
> Is that an expected behavior or is there a way (configuration option..) to 
> have all the documents available right away? I'm thinking, for instance, if 
> I have to update the engine or add some more options or for whatever reason 
> I need to turn down the engine and turn it on again, do I need to wait for 
> all the documents to be loaded in the system?
>  With Solr I see all of them available immediately after the search 
> engine has been launched...
>
>  Thank you,
> Diego
>  -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <javascript:>.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/8c23e11d-74fd-48c0-98b0-4d75514a6a33%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/elasticsearch/8c23e11d-74fd-48c0-98b0-4d75514a6a33%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
>
> 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ce468f5d-c784-46d4-8d74-965c9447696d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Migration from Solr to ElasticSearch

Reply via email to