Hello, I refactored out the HDFS directory implementation from Blur to use in my own project and was surprised to see how it performed. I'm using the both the HDFSDirectory class and the BlockCacheDirectoryFactoryV2 class.
On my local machine when using the cache there was a significant speed up. It was a small enough that each file making up lucene index (12 docs) fit into one block inside the cache. When running it on a multinode cluster on AWS the performance pulling back 1031 docs with the cache was not that much better than without. According to my log statements, the cache was being hit every time, but the difference between this an my local was that there were several blocks per file. When setting up the cache I used the default BlurConfiguration. Any ideas on how to speed up performance? Should I change the block size? Is there something that blur does to put a wrapper around the cache? ON A MULTI NODE CLUSTER Number of documents in directory[1031] Without Cache -> Try #1 -> Total execution time: 4816 Try #2 -> Total execution time: 3137 Try #3 -> Total execution time: 2921 Try #4 -> Total execution time: 2525 Try #5 -> Total execution time: 2698 Try #6 -> Total execution time: 2330 Try #7 -> Total execution time: 2464 Try #8 -> Total execution time: 2568 Try #9 -> Total execution time: 2524 Try #10 -> Total execution time: 2537 With Cache -> Cached try #1 -> Total execution time: 2228 Cached try #2 -> Total execution time: 2243 Cached try #3 -> Total execution time: 2584 Cached try #4 -> Total execution time: 2509 Cached try #5 -> Total execution time: 2163 Cached try #6 -> Total execution time: 2094 Cached try #7 -> Total execution time: 2069 Cached try #8 -> Total execution time: 2105 Cached try #9 -> Total execution time: 2124 Cached try #10 -> Total execution time: 2213 ON MY LOCAL Number of documents in directory[12] Without Cache -> Try #1 -> Total execution time: 599 Try #2 -> Total execution time: 639 Try #3 -> Total execution time: 461 Try #4 -> Total execution time: 544 Try #5 -> Total execution time: 424 Try #6 -> Total execution time: 381 Try #7 -> Total execution time: 487 Try #8 -> Total execution time: 368 Try #9 -> Total execution time: 311 Try #10 -> Total execution time: 411 With Cache -> Cached try #1 -> Total execution time: 31 Cached try #2 -> Total execution time: 32 Cached try #3 -> Total execution time: 27 Cached try #4 -> Total execution time: 23 Cached try #5 -> Total execution time: 21 Cached try #6 -> Total execution time: 26 Cached try #7 -> Total execution time: 27 Cached try #8 -> Total execution time: 28 Cached try #9 -> Total execution time: 26 Cached try #10 -> Total execution time: 27 Thanks, Josh
