[
https://issues.apache.org/jira/browse/SOLR-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114785#comment-14114785
]
Yonik Seeley commented on SOLR-6450:
------------------------------------
Although the idea has always sounded good, my un-tested guess has always been
that serialization+un-serialization will generally be more expensive than just
running analysis on the text again (which can be thought of as a serialized
form of analyzed text). It certainly depends on the analysis being performed
of course.
As an example, think of the amount of information that comes out of simple
whitespace split, lowercased text... and think about marshaling and
un-marshaling that vs just re-doing the splitting/lowercasing. Of course we
shouldn't pre-judge too much... if someone wants to try it out, we should look
at the performance numbers!
> Option to send pre-analyzed documents from leader to replica instead of
> replicas re-running analysis.
> -----------------------------------------------------------------------------------------------------
>
> Key: SOLR-6450
> URL: https://issues.apache.org/jira/browse/SOLR-6450
> Project: Solr
> Issue Type: Improvement
> Components: SolrCloud
> Reporter: Timothy Potter
>
> Given the leader has to run the full update processor chain on each document
> (text analysis, etc), it would be good to have it send a pre-analyzed
> document to replicas (to improve near realtime replication), allowing the
> replica to avoid re-doing expensive work.
> Thought should be given about allowing the leader to accept pre-analyzed as
> well, so that you could off-load the document analysis to external processes.
> For instance, have 1000's of Storm workers doing the analysis and then
> sending pre-analyzed documents to Solr.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]