[
https://issues.apache.org/jira/browse/FLINK-36594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
slankka updated FLINK-36594:
----------------------------
Description:
Recently, I'm using HiveCatalog and Hudi sync to HMS.
HiveCatalog can cause subsequently failure of Hive configuration retrieval. In
my case, Hudi cannot get hive-site conf provided in classpath.
I mean, HiveCatalog turn it off by set *HiveConf.hiveSiteLocation* to null,
then any instance of HiveConf will never load hive-site.xml which user put it
on classpath, such as yarn provided.
HiveCatalog can load hive-site.xml itself without this variable , however the
normal code after that, is still assuming HiveConf 'searches' hive-site.xml
from classpath.
Related change: https://issues.apache.org/jira/browse/FLINK-22092
Only if you addResource explicitly, set it back, or Hive search it from user
uber jar which need another effort.
My point is, {+}big data developers will be confused about to provide
core-site.xml, hive-site.xml, hbase-site.xml and so on{+}. On the other side,
developers of bigdata framework search it from here and there, and could not
make sure it's right.
AS consequence, user put their xxx-site.xml everywhere:
# /etc/hive/conf, /etc/hadoop/conf
# FLINK_HOME/lib, SPARK_HOME/conf
# yarn.provided.lib.dir ( resource prefix ./lib, ./plugin/ )
# packed in their uber jar
# --files of Apache spark, --yarnship hive-site.xml (works)
Due to the difference of deployment: yarn-per-job and yarn-application, the
main() of their application could run from different places.
The simplist way to provided xxx-site.xml is both client side classpath and
their container classpath (root path). By the way, if I am cloud infrastructure
provider, I like to put it on 1. and 2. and 3; if I am flink users, I do not
trust them, I packed in my jar and ask cloud provider to give me xxx-site.xml.
In addition, the code below are similar at using their private method
*findConfigFile* to search *hiveSiteLocation* from classpath
* org.apache.hadoop.hive.conf.HiveConf
* org.apache.hadoop.hive.metastore.conf.MetastoreConf
{*}Conclusion{*}:
# HiveConf findConfigFile and cache hiveSiteLocation only once during class
intialization.
# MetastoreConf will searches hiveSiteLocation again even somebody set it to
null. (It's better)
# both HiveConf and MetastoreConf can recognize hive-site.xml from classpath
first level. eg: "lib/hive-site.xml" is invalid.
{code:java}
class org.apache.hadoop.hive.metastore.conf.MetastoreConf
private MetastoreConf() {
throw new RuntimeException("You should never be creating one of these!");
}
public static Configuration newMetastoreConf() {
...
if(hiveSiteURL == null) {
hiveSiteURL = findConfigFile(classLoader, "hive-site.xml");
}
...
}{code}
{code:java}
class org.apache.hadoop.hive.conf.HiveConf
//HiveConf static initialization code try to search hive-site.xml, and only
once.
static {
hiveSiteURL = findConfigFile(classLoader, "hive-site.xml", true);
}
...
private void initialize(Class<?> cls) {
...
if (hiveSiteURL != null) {
addResource(hiveSiteURL);
}
...
}{code}
{code:java}
String name = "myhive";
String defaultDatabase = "mydatabase";
String hiveConfDir = "/opt/hive-conf";
HiveCatalog hive = new HiveCatalog(name, defaultDatabase, hiveConfDir);
tableEnv.registerCatalog("myhive", hive);
// set the HiveCatalog as the current catalog of the session
tableEnv.useCatalog("myhive"); {code}
after running code above:
{code:java}
//Another framework who are using hive naturely:
HiveConf hiveConf = new HiveConf(hadoopConf, HiveConf.class);
// or directly
HiveConf hiveConf = new HiveConf(); {code}
The hiveConf *DOES NOT* load hive-site.xml from classpath, which will cause
configuration loading failure.
Example code from HiveSyncConfig of Apache Hudi:
{code:java}
public HiveSyncConfig(Properties props, Configuration hadoopConf) {
super(props, hadoopConf);
HiveConf hiveConf = new HiveConf();
// HiveConf needs to load Hadoop conf to allow instantiation via
AWSGlueClientFactory
hiveConf.addResource(hadoopConf);
setHadoopConf(hiveConf);
validateParameters();
} {code}
The temporary fix of this issue is to search again :)
{code:java}
HiveConf.setHiveSiteLocation(classLoader.getResource(HiveCatalog.HIVE_SITE_FILE));
HiveConf hiveConf = new HiveConf();{code}
was:
Recently, I'm using HiveCatalog and Hudi sync to HMS.
HiveCatalog can cause subsequently failure of Hive configuration retrieval. In
my case, Hudi cannot get hive-site conf provided in classpath.
I mean, HiveCatalog turn it off by set *HiveConf.hiveSiteLocation* to null,
then any instance of HiveConf will never load hive-site.xml which user put it
on classpath, such as yarn provided.
HiveCatalog can load hive-site.xml itself without this variable , however the
normal code after that, is still assuming HiveConf 'searches' hive-site.xml
from classpath.
Related change: https://issues.apache.org/jira/browse/FLINK-22092
Only if you addResource explicitly, set it back, or Hive search it from user
uber jar which need another effort.
In addition, the code below are similar at using their private method
*findConfigFile* to search *hiveSiteURL* from classpath
* org.apache.hadoop.hive.conf.HiveConf
* org.apache.hadoop.hive.metastore.conf.MetastoreConf
Conclusion:
# HiveConf findConfigFile and cache hiveSiteLocation only once during class
intialization.
# MetastoreConf will searches hiveSiteLocation again even set it to null.
(It's better)
# both HiveConf and MetastoreConf can recognize hive-site.xml from classpath
first level. eg: "lib/hive-site.xml" is invalid.
{code:java}
class org.apache.hadoop.hive.metastore.conf.MetastoreConf
private MetastoreConf() {
throw new RuntimeException("You should never be creating one of these!");
}
public static Configuration newMetastoreConf() {
...
if(hiveSiteURL == null) {
hiveSiteURL = findConfigFile(classLoader, "hive-site.xml");
}
...
}{code}
{code:java}
class org.apache.hadoop.hive.conf.HiveConf
//HiveConf static initialization code try to search hive-site.xml, and only
once.
static {
hiveSiteURL = findConfigFile(classLoader, "hive-site.xml", true);
}
...
private void initialize(Class<?> cls) {
...
if (hiveSiteURL != null) {
addResource(hiveSiteURL);
}
...
}{code}
{code:java}
String name = "myhive";
String defaultDatabase = "mydatabase";
String hiveConfDir = "/opt/hive-conf";
HiveCatalog hive = new HiveCatalog(name, defaultDatabase, hiveConfDir);
tableEnv.registerCatalog("myhive", hive);
// set the HiveCatalog as the current catalog of the session
tableEnv.useCatalog("myhive"); {code}
after running code above:
{code:java}
//Another framework who are using hive naturely:
HiveConf hiveConf = new HiveConf(hadoopConf, HiveConf.class);
// or directly
HiveConf hiveConf = new HiveConf(); {code}
The hiveConf *DOES NOT* load hive-site.xml from classpath, which will cause
configuration loading failure.
Example code from HiveSyncConfig of Apache Hudi:
{code:java}
public HiveSyncConfig(Properties props, Configuration hadoopConf) {
super(props, hadoopConf);
HiveConf hiveConf = new HiveConf();
// HiveConf needs to load Hadoop conf to allow instantiation via
AWSGlueClientFactory
hiveConf.addResource(hadoopConf);
setHadoopConf(hiveConf);
validateParameters();
} {code}
The temporary fix of this issue is to search again :)
{code:java}
HiveConf.setHiveSiteLocation(classLoader.getResource(HiveCatalog.HIVE_SITE_FILE));
HiveConf hiveConf = new HiveConf();{code}
> HiveCatalog should set HiveConf.hiveSiteLocation back
> -----------------------------------------------------
>
> Key: FLINK-36594
> URL: https://issues.apache.org/jira/browse/FLINK-36594
> Project: Flink
> Issue Type: Bug
> Components: Connectors / Hive
> Affects Versions: 1.20.1
> Reporter: slankka
> Priority: Minor
> Labels: pull-request-available
>
> Recently, I'm using HiveCatalog and Hudi sync to HMS.
> HiveCatalog can cause subsequently failure of Hive configuration retrieval.
> In my case, Hudi cannot get hive-site conf provided in classpath.
> I mean, HiveCatalog turn it off by set *HiveConf.hiveSiteLocation* to null,
> then any instance of HiveConf will never load hive-site.xml which user put it
> on classpath, such as yarn provided.
> HiveCatalog can load hive-site.xml itself without this variable , however the
> normal code after that, is still assuming HiveConf 'searches' hive-site.xml
> from classpath.
> Related change: https://issues.apache.org/jira/browse/FLINK-22092
> Only if you addResource explicitly, set it back, or Hive search it from user
> uber jar which need another effort.
> My point is, {+}big data developers will be confused about to provide
> core-site.xml, hive-site.xml, hbase-site.xml and so on{+}. On the other side,
> developers of bigdata framework search it from here and there, and could not
> make sure it's right.
> AS consequence, user put their xxx-site.xml everywhere:
> # /etc/hive/conf, /etc/hadoop/conf
> # FLINK_HOME/lib, SPARK_HOME/conf
> # yarn.provided.lib.dir ( resource prefix ./lib, ./plugin/ )
> # packed in their uber jar
> # --files of Apache spark, --yarnship hive-site.xml (works)
> Due to the difference of deployment: yarn-per-job and yarn-application, the
> main() of their application could run from different places.
> The simplist way to provided xxx-site.xml is both client side classpath and
> their container classpath (root path). By the way, if I am cloud
> infrastructure provider, I like to put it on 1. and 2. and 3; if I am flink
> users, I do not trust them, I packed in my jar and ask cloud provider to give
> me xxx-site.xml.
>
> In addition, the code below are similar at using their private method
> *findConfigFile* to search *hiveSiteLocation* from classpath
> * org.apache.hadoop.hive.conf.HiveConf
> * org.apache.hadoop.hive.metastore.conf.MetastoreConf
>
> {*}Conclusion{*}:
> # HiveConf findConfigFile and cache hiveSiteLocation only once during class
> intialization.
> # MetastoreConf will searches hiveSiteLocation again even somebody set it to
> null. (It's better)
> # both HiveConf and MetastoreConf can recognize hive-site.xml from classpath
> first level. eg: "lib/hive-site.xml" is invalid.
>
> {code:java}
> class org.apache.hadoop.hive.metastore.conf.MetastoreConf
> private MetastoreConf() {
> throw new RuntimeException("You should never be creating one of these!");
> }
>
> public static Configuration newMetastoreConf() {
> ...
> if(hiveSiteURL == null) {
> hiveSiteURL = findConfigFile(classLoader, "hive-site.xml");
> }
> ...
> }{code}
>
> {code:java}
> class org.apache.hadoop.hive.conf.HiveConf
> //HiveConf static initialization code try to search hive-site.xml, and only
> once.
> static {
> hiveSiteURL = findConfigFile(classLoader, "hive-site.xml", true);
> }
> ...
> private void initialize(Class<?> cls) {
> ...
> if (hiveSiteURL != null) {
> addResource(hiveSiteURL);
> }
> ...
> }{code}
>
> {code:java}
> String name = "myhive";
> String defaultDatabase = "mydatabase";
> String hiveConfDir = "/opt/hive-conf";
> HiveCatalog hive = new HiveCatalog(name, defaultDatabase, hiveConfDir);
> tableEnv.registerCatalog("myhive", hive);
> // set the HiveCatalog as the current catalog of the session
> tableEnv.useCatalog("myhive"); {code}
> after running code above:
> {code:java}
> //Another framework who are using hive naturely:
> HiveConf hiveConf = new HiveConf(hadoopConf, HiveConf.class);
> // or directly
> HiveConf hiveConf = new HiveConf(); {code}
> The hiveConf *DOES NOT* load hive-site.xml from classpath, which will cause
> configuration loading failure.
>
> Example code from HiveSyncConfig of Apache Hudi:
> {code:java}
> public HiveSyncConfig(Properties props, Configuration hadoopConf) {
> super(props, hadoopConf);
> HiveConf hiveConf = new HiveConf();
> // HiveConf needs to load Hadoop conf to allow instantiation via
> AWSGlueClientFactory
> hiveConf.addResource(hadoopConf);
> setHadoopConf(hiveConf);
> validateParameters();
> } {code}
>
> The temporary fix of this issue is to search again :)
> {code:java}
> HiveConf.setHiveSiteLocation(classLoader.getResource(HiveCatalog.HIVE_SITE_FILE));
>
> HiveConf hiveConf = new HiveConf();{code}
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)