[
https://issues.apache.org/jira/browse/IMPALA-7234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554614#comment-16554614
]
Todd Lipcon commented on IMPALA-7234:
-------------------------------------
Should this function actually take into account the total byte sizes or counts
of ranges or files? In recently looking at this code I couldn't quite make
sense of the logic. For example, if we have 10 partitions that are text, each
containing one file, and one partition which is Parquet, containing 100 files,
maybe it makes more sense to estimate scan range memory usage based on Parquet
instead of text?
> Non-deterministic majority format for a table with equal partition instances
> -----------------------------------------------------------------------------
>
> Key: IMPALA-7234
> URL: https://issues.apache.org/jira/browse/IMPALA-7234
> Project: IMPALA
> Issue Type: Bug
> Reporter: Pooja Nilangekar
> Assignee: Pooja Nilangekar
> Priority: Major
>
> The getMajorityFormat method of the FeCatalogUtils currently returns
> non-deterministic results when its argument is a list of partitions where
> there is no numerical majority in terms of the number of instances. The
> result is determined by the order in which the partitions are added to the
> HashMap. We need more deterministic results which also considers the memory
> requirement among different types of partitions. Ideally, this function
> should return the format with higher memory requirements in case of a tie.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]