[jira] [Updated] (FLINK-19335) Automatically get udf's resource files from hdfs when running a job that uses hive-udf

Husky Zeng (Jira) Mon, 21 Sep 2020 19:55:46 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-19335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Husky Zeng updated FLINK-19335:
-------------------------------
    Description: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Is-there-a-way-to-avoid-submit-hive-udf-s-resources-when-we-submit-a-job-td38204.html

As the mail say，upload udf's files every time is a big trouble  in my 
production environment , which blocked our automated task submission , and lead 
to muti-data between hive-metastore and flink client ( Maintain multiple copies 
of one data in two system is easy to cause asynchronous problems) . I plan to 
develop a feature ——automatically  get udf's resource files from hdfs when 
running a job that uses hive-udf.Do you think this function is beneficial to 
the community? Or any suggestion?  

This is my plan：
https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/module/hive/HiveModule.java#L80
We have already get those udf resources's path in FunctionInfo , and pass the 
path with job，when it 

  was:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Is-there-a-way-to-avoid-submit-hive-udf-s-resources-when-we-submit-a-job-td38204.html

As the mail say，upload udf's files every time is a big trouble in my  in my 
production environment , which blocked our automated task submission , and lead 
to muti-data between hive-metastore and flink client ( Maintain multiple copies 
of one data in two system is easy to cause asynchronous problems) . I plan to 
develop a feature ——automatically  get udf's resource files from hdfs when 
running a job that uses hive-udf.Do you think this function is beneficial to 
the community? Or any suggestion?  


> Automatically  get udf's resource files from hdfs when running a job that 
> uses hive-udf
> ---------------------------------------------------------------------------------------
>
>                 Key: FLINK-19335
>                 URL: https://issues.apache.org/jira/browse/FLINK-19335
>             Project: Flink
>          Issue Type: Improvement
>          Components: Connectors / Hive
>         Environment: yarn ，per-job mode
>            Reporter: Husky Zeng
>            Priority: Major
>
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Is-there-a-way-to-avoid-submit-hive-udf-s-resources-when-we-submit-a-job-td38204.html
> As the mail say，upload udf's files every time is a big trouble  in my 
> production environment , which blocked our automated task submission , and 
> lead to muti-data between hive-metastore and flink client ( Maintain multiple 
> copies of one data in two system is easy to cause asynchronous problems) . I 
> plan to develop a feature ——automatically  get udf's resource files from hdfs 
> when running a job that uses hive-udf.Do you think this function is 
> beneficial to the community? Or any suggestion?  
> This is my plan：
> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/module/hive/HiveModule.java#L80
> We have already get those udf resources's path in FunctionInfo , and pass the 
> path with job，when it 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-19335) Automatically get udf's resource files from hdfs when running a job that uses hive-udf

Reply via email to