[
https://issues.apache.org/jira/browse/DRILL-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892090#comment-15892090
]
ASF GitHub Bot commented on DRILL-4963:
---------------------------------------
Github user arina-ielchiieva commented on the issue:
https://github.com/apache/drill/pull/701
@jinfengni
1. Depending on how often udfs are added, we don't expect it to happen
often though. But you are correct about the overhead for the queries that do
not use dynamic UDFs.
2. You are right, function registry can be checked several times and can
slow down the entire query, It's hard to say how much performance will be slow
down, as it may depend on many factors, like number of parallel queries, number
of not exact functions in query, ZK time of response and so on).
3. Refresh function registry function is considered but as part of MVCC. It
could help in current approach but still it could not guarantee that after
issuing the refresh command all drillbits will sync their local function
registries with remote one, unless refresh function would wait for all
drillbits to send their confirmation that sync was done. But what if one of
drillbits fails to sync, should refresh function have retry mechanism or fail
immediately, how long it could take the user to wait for refresh command to
finish execution etc. With MVCC refresh command would need to guarantee that
only current drillbit is in sync and all above questions will be dropped (more
in MVCC doc).
Anyway, you are totally right that current approach is covering only the
gap with function overloading and not optimal and may slow down the queries.
Having refresh command might partially solve the problem as well but might have
some issues to be covered. So it's better to dive in MVCC for the most optimal
implementation.
Regarding this pull request I don't have strong feelings if it should be
merged or not. Yes, it would solve problem with functions overloading but it
may impact performance but it's hard to say how much since many factors may
have influence.
> Issues when overloading Drill native functions with dynamic UDFs
> ----------------------------------------------------------------
>
> Key: DRILL-4963
> URL: https://issues.apache.org/jira/browse/DRILL-4963
> Project: Apache Drill
> Issue Type: Bug
> Components: Functions - Drill
> Affects Versions: 1.9.0
> Reporter: Roman
> Assignee: Arina Ielchiieva
> Labels: ready-to-commit
> Fix For: Future
>
> Attachments: subquery_udf-1.0.jar, subquery_udf-1.0-sources.jar,
> test_overloading-1.0.jar, test_overloading-1.0-sources.jar
>
>
> I created jar file which overloads 3 DRILL native functions
> (LOG(VARCHAR-REQUIRED), CURRENT_DATE(VARCHAR-REQUIRED) and
> ABS(VARCHAR-REQUIRED,VARCHAR-REQUIRED)) and registered it as dynamic UDF.
> If I try to use my functions I will get errors:
> {code:xml}
> SELECT CURRENT_DATE('test') FROM (VALUES(1));
> {code}
> Error: FUNCTION ERROR: CURRENT_DATE does not support operand types (CHAR)
> SQL Query null
> {code:xml}
> SELECT ABS('test','test') FROM (VALUES(1));
> {code}
> Error: FUNCTION ERROR: ABS does not support operand types (CHAR,CHAR)
> SQL Query null
> {code:xml}
> SELECT LOG('test') FROM (VALUES(1));
> {code}
> Error: SYSTEM ERROR: DrillRuntimeException: Failure while materializing
> expression in constant expression evaluator LOG('test'). Errors:
> Error in expression at index -1. Error: Missing function implementation:
> castTINYINT(VARCHAR-REQUIRED). Full expression: UNKNOWN EXPRESSION.
> But if I rerun all this queries after "DrillRuntimeException", they will run
> correctly. It seems that Drill have not updated the function signature before
> that error. Also if I add jar as usual UDF (copy jar to
> /drill_home/jars/3rdparty and restart drillbits), all queries will run
> correctly without errors.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)