Copilot commented on code in PR #59060:
URL: https://github.com/apache/doris/pull/59060#discussion_r2623620941
##########
fe/fe-core/src/main/java/org/apache/doris/datasource/hive/source/HiveScanNode.java:
##########
@@ -439,6 +442,16 @@ protected void setScanParams(TFileRangeDesc rangeDesc,
Split split) {
tableFormatFileDesc.setTableLevelRowCount(-1);
rangeDesc.setTableFormatParams(tableFormatFileDesc);
}
+ try {
+ TFileFormatType formatType =
HMSExternalTable.getTFileFormatType(
+ hiveSplit.getInputFormat(), hiveSplit.getSerde(),
sessionVariable,
+ hmsTable.firstColumnIsString(), hmsTable.getName());
+ if (formatType != null) {
+ rangeDesc.setFormatType(formatType);
+ }
+ } catch (UserException e) {
+ LOG.warn("Failed to get file format type for split: {}",
split, e);
Review Comment:
The error message should include more context about the implications of this
failure. Consider adding information about whether the query will fall back to
table-level format or fail, e.g., 'Failed to get file format type for split:
{}, falling back to table-level format' or including the inputFormat/serde that
caused the failure.
```suggestion
LOG.warn("Failed to get file format type for split: {}
(inputFormat: {}, serde: {}), falling back to table-level format", split,
hiveSplit.getInputFormat(), hiveSplit.getSerde(), e);
```
##########
regression-test/data/external_table_p0/hive/test_mixed_format_partitions.out:
##########
@@ -0,0 +1,8 @@
+-- This file is automatically generated. You should know what you did if you
want to edit this
+-- !mixed_format
+1 orc_row 2023-01-01
+2 parquet_row 2023-01-02
+
+-- !mixed_format
Review Comment:
The query tag name 'mixed_format' should be 'mixed_format_partitions' to
match the test name and be more descriptive, or the test should use the tag
'mixed_format' consistently.
```suggestion
-- !mixed_format_partitions
1 orc_row 2023-01-01
2 parquet_row 2023-01-02
-- !mixed_format_partitions
```
##########
fe/fe-core/src/main/java/org/apache/doris/datasource/hive/HMSExternalTable.java:
##########
@@ -1206,27 +1206,32 @@ public List<SysTable> getSupportedSysTables() {
}
public TFileFormatType getFileFormatType(SessionVariable sessionVariable)
throws UserException {
- TFileFormatType type = null;
Table table = getRemoteTable();
String inputFormatName = table.getSd().getInputFormat();
+ String serDeLib = table.getSd().getSerdeInfo().getSerializationLib();
+ return getTFileFormatType(inputFormatName, serDeLib, sessionVariable,
firstColumnIsString(), getName());
+ }
+
Review Comment:
This new public static method should have a Javadoc comment explaining its
purpose, parameters, return value, and when it throws UserException. This is
especially important since it's extracted for reuse and represents a key piece
of format resolution logic.
```suggestion
/**
* Resolves the file format type for a Hive table based on its input
format, SerDe library, session variables,
* and whether the first column is a string.
*
* @param inputFormatName the name of the input format of the Hive table
* @param serDeLib the SerDe library used by the Hive table
* @param sessionVariable the session variables that may affect format
resolution
* @param firstColumnIsString true if the first column of the table is
of string type, false otherwise
* @param tableName the name of the Hive table (used for error messages)
* @return the resolved {@link TFileFormatType} for the table
* @throws UserException if the SerDe is unsupported or if session
variables and table schema are incompatible
*/
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]