We're working ACCUMULO-3549, and a pretty conservative fix will be committed Monday.
On Sat, Jan 31, 2015 at 12:48 PM, Josh Elser <[email protected]> wrote: > That's a good point, Ed, and there hasn't been any other discussion (on > the mailing lists) so you did the right thing bringing this up here. > > There is no user administration or monitoring support that would allow > user intervention (aside from restarting a tserver which is a no-go). If > we're going to include it, like it appears so, we need to both make sure > that the cache is bounded in size and we have as many people as possible > look at it (since it's such a late addition to the release -- it's common > for us to only notice subtleties weeks to months after a change is made > during normal development cycles). > > > Ed Coleman wrote: > >> Eric commented on the vote for RC3: >> >> - - - - >> It would be nice to have ACCUMULO-3547<https://issues. >> apache.org/jira/browse/ACCUMULO-3549> in 1.6.2. >> >> We are running at scale with it at the moment, and it has made a huge >> improvement. I hate to hold up 1.6.2, though. If it doesn't make it, >> please update the ticket to point to 1.6.3. >> - - - - >> >> I generally agree with this and it seems that ACCUMULO-3547 will make it >> into 1.6.2 - which I think is the preferable option. My concerns deal with >> not having ACCUMULO-3549 included in 1.6.2 too. >> >> In ACCUMULO-3549 Keith made the assumption that end rows are 10 bytes - >> I'm not sure this is a good assumption. If end rows are larger than 10 >> bytes, then how much more memory will be required over time? How much >> faster will it grow? >> >> Without ACCUMULO-3549, what are my options for monitoring / correcting >> the situation if the cache grows too large? Will tablet server performance >> slowly degrade over time because the cache keeps growing? What will users >> need to do to monitor and then correct this? Will we be in a situation >> where tserevrs will start to run out of memory, we will increase the memory >> allocation if we can, and just kick the can down the road a little further >> and performance will just keep degrading? >> >> Is there a way to trigger the cache to clear short of restarting a >> tserver? While not optimal, having a utility / script that slowly walks >> across the tservers and clears the cache so that each tserver cache is >> cleared every 12, 24, 48,... hours may be a bridge until ACCUMULO-3549 is >> resolved. If this is the case, it would seem that having the fix in 1.6.3 >> would also be a priority. >> >> Maybe this has been discussed and resolved, but I want to bring this up >> to ensure that the ramifications have been considered and that there is a >> viable mitigation strategy that is communicated to the users. Sorry for the >> doom - end of the world tone I was just trying to emphasis the worst case >> scenarios that I could envision. I think ACCUMULO-3547 is an important >> (even necessary improvement) and I'm not suggesting that it be removed - I >> just want to make sure that I understand the other side effects and know >> our options. >> >> Ed Coleman >> >> >>
