imay commented on issue #2016: [Proposal] Limit the memory usage of Compaction URL: https://github.com/apache/incubator-doris/issues/2016#issuecomment-544145764 > > Why not estimate average row length by computing ratio of write bytes to number of rows when generating a rowset. And this value can be recorded in rowset meta. > > Do you mean the `memtable size / num of rows`? No. Now when write a rowset, RowsetWriter will estimate how how many bytes to decide if it will generate a segment. We can use this value divided by row count as average row size. > I am afraid it will increase the mem consumption of a load process. And the concurrence of loading is much larger than compaction. If doing this when loading, we need to rethink about how to limit the memory of loading. what I want to do has no effect for current load process. It will be done before we add this rowset to StorageEngine. If we found there are too many number of rowsets, we can try to merge some of them to a bigger rowset. Actually we can do it for all load operation, because it will improve our read performance. Even though we can do this, but I think it is not very urgent. For you example, when a segment has 1000 segment, the rowset size will be 100GB(1000 * 100MB), it will lead to other serious problem. Such as tablet balance, read performance. So I think for this proposal, you can leave this case aside.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
