[GitHub] [pinot] mcvsubbu commented on issue #7929: Performance problem in segment build

GitBox Sun, 19 Dec 2021 10:08:36 -0800


mcvsubbu commented on issue #7929:
URL: https://github.com/apache/pinot/issues/7929#issuecomment-997435881



   > > If you want to allocate memory in the realtime context, please use the 
class RealtimeIndexOffheapMemoryManager . It will make sure to allocate memory 
as per the configuration set by the admin -- memory mapped, or direct, etc. It 
is also accounted for, and so can be used for provisioning purpose.
   > 
   > Before my change `ByteBuffer.allocateDirect` was used, but I'll make a 
note of this.
   
   I meant if you wanted to change anything in the realtime mem allocation 
path. Code has evolved, and more and more special casing has been added since 
the time the memory manager was introduced. Not all changes have followed the 
approach. Perhaps because it did not matter either way. As we moved forward to 
make changes, we should keep that in mind. Now that you have done so, I am 
fine, thanks.
    
   > 
   > > Yes, we should not modify general logic for outlier use cases. In this 
case, we had a production issue, and had to revert the deployment, and spend 
multiple days trying to reproduce the problem, narrowing down the commit and 
then identifying a problem within that commit.
   > 
   > I apologise for the inconvenience caused by the change, but it's anything 
but an outlier use case. New user experience has to be considered - if a new 
user stores text (web scraping, XML, JSON, typically no control over max 
length) in a raw index, they will find it requires a lot of memory. How do we 
prevent that new user from having a bad experience at the same time as avoiding 
finding out about inadvertent regressions in your production environment? Just 
not making changes would be a good way to avoid regressions in your production 
environment, but it leaves new users whose workloads are slightly different to 
yours out in the cold. To avoid a repeat incident, firstly, we need better 
performance tests so that unconsidered codepaths can't regress without anyone 
knowing. Secondly, I repeat the suggestion to decouple fixed width and variable 
width storage, because the implementation has a downside either way so long as 
it's shared.
   
   Perhaps we differ a bit here. A new user of today is just starting off.  We 
must definitely be cognizant of revenue-impacting production use cases already 
running on the system. Today's new user will be tomorrow's production user, and 
will not be happy if a new release breaks something in production :-) So, we 
should be careful if we consider new use cases to be the most common ones.
   
   That being said, there should be a balance. I am definitely not suggesting 
not making any changes, and I believe you know that :-).  And some of these 
(especially performance bugs) are hard to catch before we go live. We do have 
performance testing as well in our pipeline, but this was not caught. Exactly 
one out of 150+ use cases happened to break :-) , so I would go as far to claim 
that it is impossible to guarantee that no regressions happen. We all commit 
code and do the best we can to be correct, and learn along the way as well.
   
   If you can please work with @sajjad-moradi  and arrive at a common solution 
that will be appreciated, thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [pinot] mcvsubbu commented on issue #7929: Performance problem in segment build

Reply via email to