[
https://issues.apache.org/jira/browse/HIVE-22893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17058698#comment-17058698
]
Zoltan Haindrich commented on HIVE-22893:
-----------------------------------------
this patch have created a file w/o the asf header;
I've pushed an addendum to fix it
> Enhance data size estimation for fields computed by UDFs
> --------------------------------------------------------
>
> Key: HIVE-22893
> URL: https://issues.apache.org/jira/browse/HIVE-22893
> Project: Hive
> Issue Type: Improvement
> Components: Statistics
> Reporter: Zoltan Haindrich
> Assignee: Zoltan Haindrich
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-22893.01.patch, HIVE-22893.02.patch,
> HIVE-22893.03.patch, HIVE-22893.04.patch, HIVE-22893.05.patch,
> HIVE-22893.06.patch, HIVE-22893.07.patch, HIVE-22893.08.patch,
> HIVE-22893.09.patch, HIVE-22893.10.patch, HIVE-22893.11.patch,
> HIVE-22893.12.patch, HIVE-22893.13.patch, HIVE-22893.14.patch
>
> Time Spent: 2h
> Remaining Estimate: 0h
>
> Right now if we have columnstat on a column ; we use that to estimate things
> about the column; - however if an UDF is executed on a column ; the resulting
> column is treated as unknown thing and defaults are assumed.
> An improvement could be to give wide estimation(s) in case of frequently used
> udf.
> For example; consider {{substr(c,1,1)}} ; no matter what the input; the
> output is at most a 1 long string
--
This message was sent by Atlassian Jira
(v8.3.4#803005)