[
https://issues.apache.org/jira/browse/SOLR-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14115192#comment-14115192
]
Ramkumar Aiyengar commented on SOLR-6450:
-----------------------------------------
bq. Although the idea has always sounded good, my un-tested guess has always
been that serialization+un-serialization will generally be more expensive than
just running analysis on the text again (which can be thought of as a
serialized form of analyzed text). It certainly depends on the analysis being
performed of course.
This in part might just be due to lack of a fast generic binary serializing
mechanism. We do have javabin, but that's very limited and forces us to
describe the data instead of using schemas and transferring that description
only when needed. A broader (and obviously more expensive) idea might be to
have an out-of-band (i.e. not using the servlet) streaming of binary serialized
data (say using Avro). This might open up a lot of other possibilities for
SolrCloud as such..
> Option to send pre-analyzed documents from leader to replica instead of
> replicas re-running analysis.
> -----------------------------------------------------------------------------------------------------
>
> Key: SOLR-6450
> URL: https://issues.apache.org/jira/browse/SOLR-6450
> Project: Solr
> Issue Type: Improvement
> Components: SolrCloud
> Reporter: Timothy Potter
>
> Given the leader has to run the full update processor chain on each document
> (text analysis, etc), it would be good to have it send a pre-analyzed
> document to replicas (to improve near realtime replication), allowing the
> replica to avoid re-doing expensive work.
> Thought should be given about allowing the leader to accept pre-analyzed as
> well, so that you could off-load the document analysis to external processes.
> For instance, have 1000's of Storm workers doing the analysis and then
> sending pre-analyzed documents to Solr.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]