[ 
https://issues.apache.org/jira/browse/FLINK-36594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slankka updated FLINK-36594:
----------------------------
    Description: 
Recently, I'm using HiveCatalog and Hudi sync to HMS.

HiveCatalog can cause subsequently failure of Hive configuration retrieval. In 
my case, Hudi cannot get hive-site conf provided in classpath. 

I mean, HiveCatalog turn it off by set *HiveConf.hiveSiteLocation* to null, 
then any instance of HiveConf will never load hive-site.xml which user put it 
on classpath, yarn provided. 

HiveCatalog can load hive-site.xml itself without this variable , however the 
normal code after that, is still assuming HiveConf 'searches' hive-site.xml 
from classpath. 

Related change:  https://issues.apache.org/jira/browse/FLINK-22092

Only if you addResource explicitly, set it back, or Hive search it from user 
uber jar which need another effort.

 

In addition, the code below are similar at using their private method 
*findConfigFile* to search *hiveSiteURL* from classpath
 * org.apache.hadoop.hive.conf.HiveConf
 * org.apache.hadoop.hive.metastore.conf.MetastoreConf

 
Conclusion: # HiveConf findConfigFile and cache hiveSiteLocation only once 
during class intialization.
 # MetastoreConf searches hiveSiteLocation from classpath, some HOME or some 
CONF_PATH.
 # both HiveConf and MetastoreConf can recognize hive-site.xml from classpath 
first level. eg: "lib/hive-site.xml" is invalid.

 
{code:java}
class org.apache.hadoop.hive.metastore.conf.MetastoreConf

private MetastoreConf() {
  throw new RuntimeException("You should never be creating one of these!");
}

 
public static Configuration newMetastoreConf() {
...
  if(hiveSiteURL == null) {
    hiveSiteURL = findConfigFile(classLoader, "hive-site.xml");
  }
...
}{code}
 

Example
{code:java}
//at first
HiveConf static initialization code try to search hive-site.xml, and only once.

static {
  hiveSiteURL = findConfigFile(classLoader, "hive-site.xml", true);
}{code}
 
{code:java}
String name            = "myhive";
String defaultDatabase = "mydatabase";
String hiveConfDir     = "/opt/hive-conf";

HiveCatalog hive = new HiveCatalog(name, defaultDatabase, hiveConfDir);
tableEnv.registerCatalog("myhive", hive);

// set the HiveCatalog as the current catalog of the session
tableEnv.useCatalog("myhive"); {code}
after running code above:
{code:java}
//Another framework who are using hive naturely:

HiveConf hiveConf = new HiveConf(hadoopConf, HiveConf.class); 

// or directly

HiveConf hiveConf = new HiveConf(); {code}
The hiveConf *DOES NOT* load hive-site.xml from classpath, which will cause 
configuration loading failure.

 

Example code from HiveSyncConfig of Apache Hudi:
{code:java}
public HiveSyncConfig(Properties props, Configuration hadoopConf) {
    super(props, hadoopConf);
    HiveConf hiveConf = new HiveConf();
    // HiveConf needs to load Hadoop conf to allow instantiation via 
AWSGlueClientFactory
    hiveConf.addResource(hadoopConf);
    setHadoopConf(hiveConf);
    validateParameters();
} {code}
 

The temporary fix of this issue is to search again :)
{code:java}
HiveConf.setHiveSiteLocation(classLoader.getResource(HiveCatalog.HIVE_SITE_FILE));
 
HiveConf hiveConf = new HiveConf();{code}
 

 

  was:
Recently, I'm using HiveCatalog and Hudi sync to HMS.

HiveCatalog can cause Hudi cannot get hive-site conf provided in classpath. 

HiveCatalog can load hive-site.xml itself without this variable , but the rest 
code after that, is still assuming HiveConf 'searches' hive-site.xml from 
classpath.

I mean, HiveCatalog turn it off, then any instance of HiveConf will never load 
hive-site.xml which user put it on classpath, yarn provided.

 

Only if you addResource explicitly, set it back, or Hive search it from user 
uber jar which need another effort.

 

Example
{code:java}
//at first
HiveConf static initialization code try to search hive-site.xml, and only once.

static {
  hiveSiteURL = findConfigFile(classLoader, "hive-site.xml", true);
}{code}
 
{code:java}
String name            = "myhive";
String defaultDatabase = "mydatabase";
String hiveConfDir     = "/opt/hive-conf";

HiveCatalog hive = new HiveCatalog(name, defaultDatabase, hiveConfDir);
tableEnv.registerCatalog("myhive", hive);

// set the HiveCatalog as the current catalog of the session
tableEnv.useCatalog("myhive"); {code}
after running code above:
{code:java}
//Another framework who are using hive naturely:

HiveConf hiveConf = new HiveConf(hadoopConf, HiveConf.class); 

// or directly

HiveConf hiveConf = new HiveConf(); {code}
The hiveConf *DOES NOT* load hive-site.xml from classpath, which will cause 
configuration loading failure.

Because HiveCatalog changes *HiveConf.hiveSiteLocation* to null , as result of 
https://issues.apache.org/jira/browse/FLINK-22092

 

Example code from HiveSyncConfig of Apache Hudi:
{code:java}
public HiveSyncConfig(Properties props, Configuration hadoopConf) {
    super(props, hadoopConf);
    HiveConf hiveConf = new HiveConf();
    // HiveConf needs to load Hadoop conf to allow instantiation via 
AWSGlueClientFactory
    hiveConf.addResource(hadoopConf);
    setHadoopConf(hiveConf);
    validateParameters();
} {code}
 

The temporary fix of this issue is to search again :)
{code:java}
HiveConf.setHiveSiteLocation(classLoader.getResource(HiveCatalog.HIVE_SITE_FILE));
 
HiveConf hiveConf = new HiveConf();{code}
 

 


> HiveCatalog should set HiveConf.hiveSiteLocation back
> -----------------------------------------------------
>
>                 Key: FLINK-36594
>                 URL: https://issues.apache.org/jira/browse/FLINK-36594
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / Hive
>    Affects Versions: 1.20.1
>            Reporter: slankka
>            Priority: Minor
>              Labels: pull-request-available
>
> Recently, I'm using HiveCatalog and Hudi sync to HMS.
> HiveCatalog can cause subsequently failure of Hive configuration retrieval. 
> In my case, Hudi cannot get hive-site conf provided in classpath. 
> I mean, HiveCatalog turn it off by set *HiveConf.hiveSiteLocation* to null, 
> then any instance of HiveConf will never load hive-site.xml which user put it 
> on classpath, yarn provided. 
> HiveCatalog can load hive-site.xml itself without this variable , however the 
> normal code after that, is still assuming HiveConf 'searches' hive-site.xml 
> from classpath. 
> Related change:  https://issues.apache.org/jira/browse/FLINK-22092
> Only if you addResource explicitly, set it back, or Hive search it from user 
> uber jar which need another effort.
>  
> In addition, the code below are similar at using their private method 
> *findConfigFile* to search *hiveSiteURL* from classpath
>  * org.apache.hadoop.hive.conf.HiveConf
>  * org.apache.hadoop.hive.metastore.conf.MetastoreConf
>  
> Conclusion: # HiveConf findConfigFile and cache hiveSiteLocation only once 
> during class intialization.
>  # MetastoreConf searches hiveSiteLocation from classpath, some HOME or some 
> CONF_PATH.
>  # both HiveConf and MetastoreConf can recognize hive-site.xml from classpath 
> first level. eg: "lib/hive-site.xml" is invalid.
>  
> {code:java}
> class org.apache.hadoop.hive.metastore.conf.MetastoreConf
> private MetastoreConf() {
>   throw new RuntimeException("You should never be creating one of these!");
> }
>  
> public static Configuration newMetastoreConf() {
> ...
>   if(hiveSiteURL == null) {
>     hiveSiteURL = findConfigFile(classLoader, "hive-site.xml");
>   }
> ...
> }{code}
>  
> Example
> {code:java}
> //at first
> HiveConf static initialization code try to search hive-site.xml, and only 
> once.
> static {
>   hiveSiteURL = findConfigFile(classLoader, "hive-site.xml", true);
> }{code}
>  
> {code:java}
> String name            = "myhive";
> String defaultDatabase = "mydatabase";
> String hiveConfDir     = "/opt/hive-conf";
> HiveCatalog hive = new HiveCatalog(name, defaultDatabase, hiveConfDir);
> tableEnv.registerCatalog("myhive", hive);
> // set the HiveCatalog as the current catalog of the session
> tableEnv.useCatalog("myhive"); {code}
> after running code above:
> {code:java}
> //Another framework who are using hive naturely:
> HiveConf hiveConf = new HiveConf(hadoopConf, HiveConf.class); 
> // or directly
> HiveConf hiveConf = new HiveConf(); {code}
> The hiveConf *DOES NOT* load hive-site.xml from classpath, which will cause 
> configuration loading failure.
>  
> Example code from HiveSyncConfig of Apache Hudi:
> {code:java}
> public HiveSyncConfig(Properties props, Configuration hadoopConf) {
>     super(props, hadoopConf);
>     HiveConf hiveConf = new HiveConf();
>     // HiveConf needs to load Hadoop conf to allow instantiation via 
> AWSGlueClientFactory
>     hiveConf.addResource(hadoopConf);
>     setHadoopConf(hiveConf);
>     validateParameters();
> } {code}
>  
> The temporary fix of this issue is to search again :)
> {code:java}
> HiveConf.setHiveSiteLocation(classLoader.getResource(HiveCatalog.HIVE_SITE_FILE));
>  
> HiveConf hiveConf = new HiveConf();{code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to