[ https://issues.apache.org/jira/browse/SOLR-7393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hari Sekhon updated SOLR-7393: ------------------------------ Description: When switching SolrCloud from local dataDir to HDFS directory factory indexing performance falls through the floor. I've also observed very high latency on both QTime and code timer on HDFS writes compares to local dataDir writes (using check_solr_write.pl from https://github.com/harisekhon/nagios-plugins). Single test document write latency jumps from a few dozen milliseconds to 700-1700 millisecs, over 2000 on some runs. A previous bulk indexing Hive to SolrCloud online indexing job that took 2 hours for 620M rows ended up taking a projected 20+ hours and never completing, usually breaking around the 16-17 hour timeframe when left overnight. It's worth noting that I had to disable the HDFS write cache which was causing index corruption (SOLR-7255) on the advice of Mark Miller, who tells me this doesn't make much performance difference anway. This is probably also related to SolrCloud not respecting HDFS replication factor, effectively making 4 copies of data instead of 2 (SOLR-6528), but that solely doesn't account for the massive performance drop going from vanilla SolrCloud to SolrCloud on HDFS HA + Kerberos. Hari Sekhon http://www.linkedin.com/in/harisekhon was: When switching SolrCloud from local dataDir to HDFS directory factory indexing performance falls through the floor. A previous Hive to SolrCloud online indexing job that took 2 hours for 620M rows ended up taking a projected 20+ hours and never completing, usually breaking around the 16-17 hour timeframe when left overnight. It's worth noting that I had to disable the HDFS write cache which was causing index corruption (SOLR-7255) on the advice of Mark Miller, who tells me this doesn't make much performance difference anway. This is probably also related to SolrCloud not respecting HDFS replication factor, effectively making 4 copies of data instead of 2 (SOLR-6528), but that solely doesn't account for the massive performance drop going from vanilla SolrCloud to SolrCloud on HDFS HA + Kerberos. Hari Sekhon http://www.linkedin.com/in/harisekhon > HDFS poor bulk indexing performance > ----------------------------------- > > Key: SOLR-7393 > URL: https://issues.apache.org/jira/browse/SOLR-7393 > Project: Solr > Issue Type: Bug > Components: Hadoop Integration, hdfs, SolrCloud > Affects Versions: 4.7.2, 4.10.3 > Environment: HDP 2.2 / HDP Search + LucidWorks Hive SerDe > Reporter: Hari Sekhon > Priority: Critical > > When switching SolrCloud from local dataDir to HDFS directory factory > indexing performance falls through the floor. > I've also observed very high latency on both QTime and code timer on HDFS > writes compares to local dataDir writes (using check_solr_write.pl from > https://github.com/harisekhon/nagios-plugins). Single test document write > latency jumps from a few dozen milliseconds to 700-1700 millisecs, over 2000 > on some runs. > A previous bulk indexing Hive to SolrCloud online indexing job that took 2 > hours for 620M rows ended up taking a projected 20+ hours and never > completing, usually breaking around the 16-17 hour timeframe when left > overnight. > It's worth noting that I had to disable the HDFS write cache which was > causing index corruption (SOLR-7255) on the advice of Mark Miller, who tells > me this doesn't make much performance difference anway. > This is probably also related to SolrCloud not respecting HDFS replication > factor, effectively making 4 copies of data instead of 2 (SOLR-6528), but > that solely doesn't account for the massive performance drop going from > vanilla SolrCloud to SolrCloud on HDFS HA + Kerberos. > Hari Sekhon > http://www.linkedin.com/in/harisekhon -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org