[jira] [Commented] (PHOENIX-6061) optimize the estimated mutation size

Kadir Ozdemir (Jira) Mon, 13 Jun 2022 13:26:05 -0700


    [ 
https://issues.apache.org/jira/browse/PHOENIX-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17553782#comment-17553782
 ]


Kadir Ozdemir commented on PHOENIX-6061:
----------------------------------------

[~yanxinyi], I understand this is a computation overhead issue but did not 
understand why it is expensive in terms of space. I suggest a simple solution 
here. We define the estimatedSize field, computed it within constructors, and 
returned it when getEstimatedSize() is called. Or we can do this computation 
lazily such that we initialize the estimatedSize field to zero and check if it 
is zero within getEstimatedSize() and update it if it is zero.

> optimize the estimated mutation size 
> -------------------------------------
>
>                 Key: PHOENIX-6061
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6061
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Xinyi Yan
>            Priority: Major
>
> The current max mutation size is estimated by jvm level column size plus 
> column family size. See 
> [https://github.com/apache/phoenix/blob/master/phoenix-core/src/main/java/org/apache/phoenix/schema/PNameImpl.java#L48]
> This is very expansive in terms of space. Most of the use cases are upserting 
> to the same table/columns and storing the same column/column family. Think 
> about the case where we upsert into 100 rows to the Dummy table(10 columns, 
> COL1, COL2,... COL10). Phoenix calculates the estimate 10 columns and column 
> family for each row, and this metadata info has to  100 times. We probably 
> can do something smarter there.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (PHOENIX-6061) optimize the estimated mutation size

Reply via email to