[ 
https://issues.apache.org/jira/browse/SOLR-6816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384462#comment-14384462
 ] 

Timothy Potter commented on SOLR-6816:
--------------------------------------

I think you mis-understood my point. I wasn't talking about retrying documents 
in the same UpdateRequest. If a Map/Reduce task fails, the HDFS block is 
retried entirely, meaning a Hadoop-based indexing job may send the same docs 
that have already been added so using overwite=false is dangerous when doing 
this type of bulk indexing. The solution proposed in SOLR-3382 would be great 
to have as well though.

I'm working on implementing the version bucket initialization approach that 
Yonik suggested but I'm wondering if we can do better with the hand-off from 
leader to replica? For instance, the leader knows if a doc didn't exist 
(because it's doing a similar ID lookup), so why can't the leader simply share 
that information with the replica, thus allowing the replica to avoid looking 
up a doc that doesn't exist. I'll get the version bucket initialization done 
and then see if this is still a concern, but just bugs me that we're wasting 
all this CPU on looking for docs the leader already knows don't exist.

> Review SolrCloud Indexing Performance.
> --------------------------------------
>
>                 Key: SOLR-6816
>                 URL: https://issues.apache.org/jira/browse/SOLR-6816
>             Project: Solr
>          Issue Type: Task
>          Components: SolrCloud
>            Reporter: Mark Miller
>            Priority: Critical
>         Attachments: SolrBench.pdf
>
>
> We have never really focused on indexing performance, just correctness and 
> low hanging fruit. We need to vet the performance and try to address any 
> holes.
> Note: A common report is that adding any replication is very slow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to