[
https://issues.apache.org/jira/browse/IGNITE-8359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16459546#comment-16459546
]
Joel Lang commented on IGNITE-8359:
-----------------------------------
So using the 2.5 nightly build from April 27th, I ran the code to generate the
6,000,000 entries for cache A then 12,000,000 entries for cache B using the
data streamer for each. This again in a linux VM operating on a HDD.
This was started on Friday before I left work. When I came into the office I
found that it had not even finished the operation. It was about 95-97% done.
The fact that it didn't finish over such a long period of time is a bit obscene.
> Severe performance degradation with persistence and data streaming on HDD
> -------------------------------------------------------------------------
>
> Key: IGNITE-8359
> URL: https://issues.apache.org/jira/browse/IGNITE-8359
> Project: Ignite
> Issue Type: Bug
> Components: cache, persistence, sql, streaming
> Affects Versions: 2.4, 2.5
> Environment: Linux CentOS 7 VM using Ignite DirectIO plugin with HDD.
> Reporter: Joel Lang
> Priority: Major
>
> I am testing the use of Ignite's native persistence to store a data set long
> term. This is on a 2.5 nightly build. To do this I am using Ignite's data
> streamers to stream in 6,000,000 entries into cache A, and 12,000,000 entries
> into cache B to simulate the upper limit for 2 years worth of data.
> The test ran smoothly on my personal machine which has a SSD running Windows,
> but ran into tremendous issues on a development test machine which is a Linux
> VM using a HDD. I realize when looking at Ignite documentation that it
> specifically excludes HDD's as something to base a persistent store on, but
> perhaps my experience could yield improvements for SSD performance too.
> The root issue is that cache updates over time become severely bottlenecked
> by reading SQL index pages from disk in order to update the index. If I had
> to guess this would be related to BPlusTree.findInsertionPoint() and it
> having to load pages from disk if they've been evicted.
> I used a 2.5 nightly build because 2.3 and 2.4 have the same issue where this
> whole process was further bottlenecked by a lock behind held by Ignite while
> it read the page from disk in PageMemoryImpl.acquirePage(). 2.5 fixed this.
> The performance issue was much more severe in the previously mentioned cache
> B, which contains user comments on entries in cache A. The key for each
> comment entry is a Java class containing the creation timestamp and the
> string key of the owning entry in cache A. This owning entry key is indexed
> so comments can be queried by their owner. In this test case there were two
> comments in cache B for every entry in cache A.
> I found that even 25% of the way through streaming data into cache B, it
> would take anywhere from 15 to 35 seconds to insert a batch of 2000 comments.
> This slowed streaming to a crawl and ensures that streaming would need to
> continue overnight to have any hope of finishing.
> This also brings up concerns about data rebalancing which will have the same
> performance penalty and similarly take a day at least to rebalance both
> caches.
> I am worried about the dependency on a large amount of disk reads being done
> to update the index, even though it is considerably faster with an SSD than
> without. I've also not been able to test whether performance for an SSD will
> be different when running in a VM, which is another worry.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)