imay commented on issue #2016: [Proposal] Limit the memory usage of Compaction
URL: 
https://github.com/apache/incubator-doris/issues/2016#issuecomment-544145764
 
 
   > > Why not estimate average row length by computing ratio of write bytes to 
number of rows when generating a rowset. And this value can be recorded in 
rowset meta.
   > 
   > Do you mean the `memtable size / num of rows`?
   
   No. Now when write a rowset, RowsetWriter will estimate how how many bytes 
to decide if it will generate a segment. We can use this value divided by row 
count as average row size.
   
   
   > I am afraid it will increase the mem consumption of a load process. And 
the concurrence of loading is much larger than compaction. If doing this when 
loading, we need to rethink about how to limit the memory of loading.
   
   what I want to do has no effect for current load process. It will be done 
before we add this rowset to StorageEngine. If we found there are too many 
number of rowsets, we can try to merge some of them to a bigger rowset. 
Actually we can do it for all load operation, because it will improve our read 
performance. 
   
   Even though we can do this, but I think it is not very urgent. For you 
example, when a segment has 1000 segment, the rowset size will be 100GB(1000 * 
100MB), it will lead to other serious problem. Such as tablet balance, read 
performance. So I think for this proposal, you can leave this case aside.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to