[ 
https://issues.apache.org/jira/browse/FLINK-36594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slankka updated FLINK-36594:
----------------------------
    Description: 
Recently, I'm using HiveCatalog and Hudi sync to HMS.

HiveCatalog can cause Hudi cannot get hive-site conf provided in classpath. 

HiveCatalog can load hive-site.xml itself without this variable , but the rest 
code after that, is still assuming HiveConf 'searches' hive-site.xml from 
classpath.

I mean, HiveCatalog turn it off, then any instance of HiveConf will never load 
hive-site.xml which user put it on classpath, yarn provided.

 

Only if you addResource explicitly, set it back, or Hive search it from user 
uber jar which need another effort.

 

Example
{code:java}
//at first
HiveConf static initialization code try to search hive-site.xml, and only once.

static {
  hiveSiteURL = findConfigFile(classLoader, "hive-site.xml", true);
}{code}
 
{code:java}
String name            = "myhive";
String defaultDatabase = "mydatabase";
String hiveConfDir     = "/opt/hive-conf";

HiveCatalog hive = new HiveCatalog(name, defaultDatabase, hiveConfDir);
tableEnv.registerCatalog("myhive", hive);

// set the HiveCatalog as the current catalog of the session
tableEnv.useCatalog("myhive"); {code}
after running code above:
{code:java}
//Another framework who are using hive naturely:

HiveConf hiveConf = new HiveConf(hadoopConf, HiveConf.class); 

// or directly

HiveConf hiveConf = new HiveConf(); {code}
The hiveConf *DOES NOT* load hive-site.xml from classpath, which will cause 
configuration loading failure.

Because HiveCatalog changes *HiveConf.hiveSiteLocation* to null , as result of 
https://issues.apache.org/jira/browse/FLINK-22092

 

Example code from HiveSyncConfig of Apache Hudi:
{code:java}
public HiveSyncConfig(Properties props, Configuration hadoopConf) {
    super(props, hadoopConf);
    HiveConf hiveConf = new HiveConf();
    // HiveConf needs to load Hadoop conf to allow instantiation via 
AWSGlueClientFactory
    hiveConf.addResource(hadoopConf);
    setHadoopConf(hiveConf);
    validateParameters();
} {code}
 

The temporary fix of this issue is to search again :)
{code:java}
HiveConf.setHiveSiteLocation(classLoader.getResource(HiveCatalog.HIVE_SITE_FILE));
 
HiveConf hiveConf = new HiveConf();{code}
 

 

  was:
recently, I'm using Hudi sync to HMS, when using HiveCatalog, which will cause 
Hudi cannot get hive-site conf provided in classpath. I found the root cause is 
HiveCatalog changes the HiveConf.

HiveCatalog can load hive-site.xml itself without this variable , but the rest 
code after that, is still assuming HiveConf 'searches' hive-site.xml from 
classpath.

I mean, HiveCatalog turn it off, then any instance of HiveConf will never load 
hive-site.xml which user put it on classpath, yarn provided.

 

Only if you addResource explicitly, set it back, or Hive search it from user 
uber jar which need another effort.

 

Example
{code:java}
//at first
HiveConf static initialization code try to search hive-site.xml, and only once.

static {
  hiveSiteURL = findConfigFile(classLoader, "hive-site.xml", true);
}{code}
 
{code:java}
String name            = "myhive";
String defaultDatabase = "mydatabase";
String hiveConfDir     = "/opt/hive-conf";

HiveCatalog hive = new HiveCatalog(name, defaultDatabase, hiveConfDir);
tableEnv.registerCatalog("myhive", hive);

// set the HiveCatalog as the current catalog of the session
tableEnv.useCatalog("myhive"); {code}
after running code above:
{code:java}
//Another framework who are using hive naturely:

HiveConf hiveConf = new HiveConf(hadoopConf, HiveConf.class); 

// or directly

HiveConf hiveConf = new HiveConf(); {code}
The hiveConf *DOES NOT* load hive-site.xml from classpath, which will cause 
configuration loading failure.

Because HiveCatalog changes HiveConf.hiveSiteLocation to null , as result of 
https://issues.apache.org/jira/browse/FLINK-22092

 

Example code from HiveSyncConfig of Apache Hudi:
{code:java}
public HiveSyncConfig(Properties props, Configuration hadoopConf) {
    super(props, hadoopConf);
    HiveConf hiveConf = new HiveConf();
    // HiveConf needs to load Hadoop conf to allow instantiation via 
AWSGlueClientFactory
    hiveConf.addResource(hadoopConf);
    setHadoopConf(hiveConf);
    validateParameters();
} {code}
 

The temporary fix of this issue is to search again :)
{code:java}
HiveConf.setHiveSiteLocation(classLoader.getResource(HiveCatalog.HIVE_SITE_FILE));
 
HiveConf hiveConf = new HiveConf();{code}
 

 


> HiveCatalog should set HiveConf.hiveSiteLocation back
> -----------------------------------------------------
>
>                 Key: FLINK-36594
>                 URL: https://issues.apache.org/jira/browse/FLINK-36594
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / Hive
>    Affects Versions: 1.20.1
>            Reporter: slankka
>            Priority: Minor
>              Labels: pull-request-available
>
> Recently, I'm using HiveCatalog and Hudi sync to HMS.
> HiveCatalog can cause Hudi cannot get hive-site conf provided in classpath. 
> HiveCatalog can load hive-site.xml itself without this variable , but the 
> rest code after that, is still assuming HiveConf 'searches' hive-site.xml 
> from classpath.
> I mean, HiveCatalog turn it off, then any instance of HiveConf will never 
> load hive-site.xml which user put it on classpath, yarn provided.
>  
> Only if you addResource explicitly, set it back, or Hive search it from user 
> uber jar which need another effort.
>  
> Example
> {code:java}
> //at first
> HiveConf static initialization code try to search hive-site.xml, and only 
> once.
> static {
>   hiveSiteURL = findConfigFile(classLoader, "hive-site.xml", true);
> }{code}
>  
> {code:java}
> String name            = "myhive";
> String defaultDatabase = "mydatabase";
> String hiveConfDir     = "/opt/hive-conf";
> HiveCatalog hive = new HiveCatalog(name, defaultDatabase, hiveConfDir);
> tableEnv.registerCatalog("myhive", hive);
> // set the HiveCatalog as the current catalog of the session
> tableEnv.useCatalog("myhive"); {code}
> after running code above:
> {code:java}
> //Another framework who are using hive naturely:
> HiveConf hiveConf = new HiveConf(hadoopConf, HiveConf.class); 
> // or directly
> HiveConf hiveConf = new HiveConf(); {code}
> The hiveConf *DOES NOT* load hive-site.xml from classpath, which will cause 
> configuration loading failure.
> Because HiveCatalog changes *HiveConf.hiveSiteLocation* to null , as result 
> of https://issues.apache.org/jira/browse/FLINK-22092
>  
> Example code from HiveSyncConfig of Apache Hudi:
> {code:java}
> public HiveSyncConfig(Properties props, Configuration hadoopConf) {
>     super(props, hadoopConf);
>     HiveConf hiveConf = new HiveConf();
>     // HiveConf needs to load Hadoop conf to allow instantiation via 
> AWSGlueClientFactory
>     hiveConf.addResource(hadoopConf);
>     setHadoopConf(hiveConf);
>     validateParameters();
> } {code}
>  
> The temporary fix of this issue is to search again :)
> {code:java}
> HiveConf.setHiveSiteLocation(classLoader.getResource(HiveCatalog.HIVE_SITE_FILE));
>  
> HiveConf hiveConf = new HiveConf();{code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to