[
https://issues.apache.org/jira/browse/HIVE-22893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17045891#comment-17045891
]
Jesus Camacho Rodriguez commented on HIVE-22893:
------------------------------------------------
Left a couple of additional minor comments over last PR.
Other than that, LGTM.
+1 (pending tests)
> Enhance data size estimation for fields computed by UDFs
> --------------------------------------------------------
>
> Key: HIVE-22893
> URL: https://issues.apache.org/jira/browse/HIVE-22893
> Project: Hive
> Issue Type: Improvement
> Components: Statistics
> Reporter: Zoltan Haindrich
> Assignee: Zoltan Haindrich
> Priority: Major
> Labels: pull-request-available
> Attachments: HIVE-22893.01.patch, HIVE-22893.02.patch,
> HIVE-22893.03.patch, HIVE-22893.04.patch, HIVE-22893.05.patch,
> HIVE-22893.06.patch, HIVE-22893.07.patch, HIVE-22893.08.patch,
> HIVE-22893.09.patch, HIVE-22893.10.patch, HIVE-22893.11.patch,
> HIVE-22893.12.patch, HIVE-22893.13.patch
>
> Time Spent: 1h 50m
> Remaining Estimate: 0h
>
> Right now if we have columnstat on a column ; we use that to estimate things
> about the column; - however if an UDF is executed on a column ; the resulting
> column is treated as unknown thing and defaults are assumed.
> An improvement could be to give wide estimation(s) in case of frequently used
> udf.
> For example; consider {{substr(c,1,1)}} ; no matter what the input; the
> output is at most a 1 long string
--
This message was sent by Atlassian Jira
(v8.3.4#803005)