On Fri, May 22, 2015 at 3:33 AM, Ravikumar Govindarajan < [email protected]> wrote:
> Recently I am trying to consider deploying SSDs on search machines > > Each machine runs data-nodes + shard-server and local reads of hadoop are > leveraged…. > > SSDs are a great-fit for general lucene/solr kind of setups. But for blur, > I need some help… > > 1. Is it a good idea to consider SSDs, especially when block-cache is > present? > Possibly, I don't have any hard number for this type of setup. My guess is that SSDs are only going to help when the blocks for the shard are local and short circuit reads are enabled. > 2. Are there any grids running blur on SSDs and how they compare to normal > HDDs? > I haven't run any at scale yet. > 3. Can we disable block-cache on SSDs, especially when local-reads are > enabled? > I would not recommend disabling the block cache. However you could likely lower the size of the cache and reduce the overall memory footprint of Blur. > 4. Using SSDs, blur/lucene will surely be CPU bound. But I don't know what > over-heads hadoop local-reads brings to the table… > If you are using short circuit reads I have seen performance of local accesses nearing that of native IO. However if Blur is making remote HDFS calls every call is like a cache miss. One interesting thought would be to try using the HDFS cache feature that is present in the most recent versions of HDFS. I haven't tried it yet but it would be interesting to try. > > Any help is much appreciated because I cannot find any info from web on > this topic > > -- > Ravi >
