[
https://issues.apache.org/jira/browse/SOLR-7393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15478021#comment-15478021
]
David Johnson commented on SOLR-7393:
-------------------------------------
Does Hadoop have the native library configuration set appropriately?
> HDFS poor indexing performance
> ------------------------------
>
> Key: SOLR-7393
> URL: https://issues.apache.org/jira/browse/SOLR-7393
> Project: Solr
> Issue Type: Bug
> Components: Hadoop Integration, hdfs, SolrCloud
> Affects Versions: 4.7.2, 4.10.3
> Environment: HDP 2.2 / HDP Search + LucidWorks Hive SerDe
> Reporter: Hari Sekhon
> Priority: Critical
>
> When switching SolrCloud from local dataDir to HDFS directory factory
> indexing performance falls through the floor.
> I've also observed very high latency on both QTime and code timer on HDFS
> writes compares to local dataDir writes (using check_solr_write.pl from
> https://github.com/harisekhon/nagios-plugins). Single test document write
> latency jumps from a few dozen milliseconds to 700-1700 millisecs, over 2000
> on some runs.
> A previous bulk online indexing job from Hive to SolrCloud that took 2 hours
> for 620M rows ended up taking a projected 20+ hours and never completing,
> usually breaking around the 16-17 hour timeframe when left overnight.
> It's worth noting that I had to disable the HDFS write cache which was
> causing index corruption (SOLR-7255) on the advice of Mark Miller, who tells
> me this doesn't make much performance difference anway.
> This is probably also related to SolrCloud not respecting HDFS replication
> factor, effectively making 4 copies of data instead of 2 (SOLR-6528), but
> that solely doesn't account for the massive performance drop going from
> vanilla SolrCloud to SolrCloud on HDFS HA + Kerberos.
> Hari Sekhon
> http://www.linkedin.com/in/harisekhon
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]