zabetak commented on code in PR #6197:
URL: https://github.com/apache/hive/pull/6197#discussion_r2665303565
##########
ql/src/java/org/apache/hadoop/hive/ql/plan/TableDesc.java:
##########
@@ -137,7 +138,7 @@ public void setJobProperties(Map<String, String>
jobProperties) {
this.jobProperties = jobProperties;
}
- @Explain(displayName = "jobProperties", explainLevels = { Level.EXTENDED })
+ @Explain(displayName = ExplainTask.JOB_PROPERTIES, explainLevels = {
Level.EXTENDED })
Review Comment:
Using a variable creates dependencies (import) to classes that were not
present before. Another point that is a bit worrisome is that we rely on the
displayName (that should be relevant only for an end-user) to perform some
actions in the code. I don't know if this pattern is already present in the
code but overall seems shaky and we should better avoid it if possible.
Anyways, as mentioned elsewhere this change should land separately so we can
continue the discussion on the respective PR.
##########
jdbc-handler/src/main/java/org/apache/hive/storage/jdbc/JdbcStorageHandler.java:
##########
@@ -101,12 +103,12 @@ public void configureInputJobProperties(TableDesc
tableDesc, Map<String, String>
@Override
public URI getURIForAuth(Table table) throws URISyntaxException {
Map<String, String> tableProperties =
HiveCustomStorageHandlerUtils.getTableProperties(table);
- DatabaseType dbType = DatabaseType.valueOf(
+ DatabaseType dbType = DatabaseType.from(
tableProperties.get(JdbcStorageConfig.DATABASE_TYPE.getPropertyName()));
- String host_url = DatabaseType.METASTORE == dbType ?
+ String hostUrl = DatabaseType.METASTORE == dbType ?
"jdbc:metastore://" : tableProperties.get(Constants.JDBC_URL);
- String table_name = tableProperties.get(Constants.JDBC_TABLE);
- return new URI(host_url+"/"+table_name);
+ String tableName =
unescapeHiveJdbcIdentifier(tableProperties.get(Constants.JDBC_TABLE));
Review Comment:
Using lower case may not always be an option since as far as I recall there
are DBMS that store unquoted identifers using different conventions (e.g.,
upper case).
##########
jdbc-handler/src/main/java/org/apache/hive/storage/jdbc/dao/GenericJdbcDatabaseAccessor.java:
##########
@@ -537,12 +539,27 @@ public boolean needColumnQuote() {
return true;
}
+ /**
+ * Quotes an identifier based on the database type.
+ *
+ * @param identifier The identifier to quote
+ * @param dbType The DatabaseType enum
+ * @return A database-specific quoted identifier (e.g., "\"Country\"" or
"`Country`")
+ */
+ protected static String quoteIdentifier(String identifier, DatabaseType
dbType) {
+ if (identifier == null || dbType == null || dbType.getStartQuote() ==
null) {
+ return identifier;
+ }
+ return dbType.getStartQuote() + identifier + dbType.getEndQuote();
+ }
+
private static String getQualifiedTableName(Configuration conf) {
- String tableName = conf.get(Constants.JDBC_TABLE);
+ DatabaseType dbType =
DatabaseType.from(conf.get(JdbcStorageConfig.DATABASE_TYPE.getPropertyName()));
+ String tableName =
quoteIdentifier(unescapeHiveJdbcIdentifier(conf.get(Constants.JDBC_TABLE)),
dbType);
if (tableName == null) {
return null;
}
- String schemaName = conf.get(Constants.JDBC_SCHEMA);
+ String schemaName =
quoteIdentifier(unescapeHiveJdbcIdentifier(conf.get(Constants.JDBC_SCHEMA)),
dbType);
Review Comment:
Conditional escaping is a good idea. I have the impression though that we
may also need to make quoting conditional. For the conditional we have various
options:
* Based on the content/value of the `hive.sql.table` property as proposed
above
* Based on a new global config property (e.g.,
`hive.jdbc.identifiers.casesensitive`)
* Based on table level property (e.g., `hive.sql.identifiers.casesensitive`)
Not sure whats the best option to move forward. Maybe we can take some
inspiration by checking what other engines (Presto/Trino/Spark/etc) are doing.
##########
jdbc-handler/src/main/java/org/apache/hive/storage/jdbc/DBRecordWritable.java:
##########
@@ -64,8 +64,13 @@ public void write(PreparedStatement statement) throws
SQLException {
ParameterMetaData parameterMetaData = statement.getParameterMetaData();
for (int i = 0; i < columnValues.length; i++) {
Object value = columnValues[i];
- if ((parameterMetaData.getParameterType(i + 1) == Types.CHAR) && value
!= null && value instanceof Boolean) {
- value = ((Boolean) value).booleanValue() ? "1" : "0";
+ try {
+ if (value instanceof Boolean b &&
(parameterMetaData.getParameterType(i + 1) == Types.CHAR)) {
+ value = b ? "1" : "0";
+ }
+ } catch (SQLException e) {
Review Comment:
It may be that the exception that you encountered was not fatal but if we
catch every kind of exception and suppress it we cannot really know what is
fatal or not.
##########
jdbc-handler/src/main/java/org/apache/hive/storage/jdbc/JdbcStorageHandler.java:
##########
@@ -101,12 +103,12 @@ public void configureInputJobProperties(TableDesc
tableDesc, Map<String, String>
@Override
public URI getURIForAuth(Table table) throws URISyntaxException {
Map<String, String> tableProperties =
HiveCustomStorageHandlerUtils.getTableProperties(table);
- DatabaseType dbType = DatabaseType.valueOf(
+ DatabaseType dbType = DatabaseType.from(
tableProperties.get(JdbcStorageConfig.DATABASE_TYPE.getPropertyName()));
- String host_url = DatabaseType.METASTORE == dbType ?
+ String hostUrl = DatabaseType.METASTORE == dbType ?
"jdbc:metastore://" : tableProperties.get(Constants.JDBC_URL);
- String table_name = tableProperties.get(Constants.JDBC_TABLE);
- return new URI(host_url+"/"+table_name);
+ String tableName =
unescapeHiveJdbcIdentifier(tableProperties.get(Constants.JDBC_TABLE));
Review Comment:
Apart from ambiguity quoted identifiers may contain arbitrary characters
inside the quotes that are not URI friendly (e.g., ?%+-/&$). We may have to
come up with a more robust normalization strategy but this depends on how
`getURIForAuth` is used and if we care about supporting such use-cases.
##########
ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java:
##########
@@ -5094,4 +5094,32 @@ public static String getTableOrMVSuffix(Context context,
boolean createTableOrMV
}
return suffix;
}
+
+ /**
+ * Unescape a Hive JDBC identifier, removing surrounding double quotes if
present.
+ * @param identifier the identifier to unescape
+ * @return the unescaped identifier
+ */
+ public static String unescapeHiveJdbcIdentifier(String identifier) {
+ return unescapeIdentifier(identifier, '"');
Review Comment:
At this stage, I am trying to understand the impact on existing use-cases.
Does `"hive.sql.table"="[WorldData]"` work without the changes in this PR? Was
there any kind of quoting that was working even without the changes in this PR?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]