[
https://issues.apache.org/jira/browse/PHOENIX-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17553782#comment-17553782
]
Kadir Ozdemir commented on PHOENIX-6061:
----------------------------------------
[~yanxinyi], I understand this is a computation overhead issue but did not
understand why it is expensive in terms of space. I suggest a simple solution
here. We define the estimatedSize field, computed it within constructors, and
returned it when getEstimatedSize() is called. Or we can do this computation
lazily such that we initialize the estimatedSize field to zero and check if it
is zero within getEstimatedSize() and update it if it is zero.
> optimize the estimated mutation size
> -------------------------------------
>
> Key: PHOENIX-6061
> URL: https://issues.apache.org/jira/browse/PHOENIX-6061
> Project: Phoenix
> Issue Type: Improvement
> Reporter: Xinyi Yan
> Priority: Major
>
> The current max mutation size is estimated by jvm level column size plus
> column family size. See
> [https://github.com/apache/phoenix/blob/master/phoenix-core/src/main/java/org/apache/phoenix/schema/PNameImpl.java#L48]
> This is very expansive in terms of space. Most of the use cases are upserting
> to the same table/columns and storing the same column/column family. Think
> about the case where we upsert into 100 rows to the Dummy table(10 columns,
> COL1, COL2,... COL10). Phoenix calculates the estimate 10 columns and column
> family for each row, and this metadata info has to 100 times. We probably
> can do something smarter there.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)