Copilot commented on code in PR #59060:
URL: https://github.com/apache/doris/pull/59060#discussion_r2623620941


##########
fe/fe-core/src/main/java/org/apache/doris/datasource/hive/source/HiveScanNode.java:
##########
@@ -439,6 +442,16 @@ protected void setScanParams(TFileRangeDesc rangeDesc, 
Split split) {
                 tableFormatFileDesc.setTableLevelRowCount(-1);
                 rangeDesc.setTableFormatParams(tableFormatFileDesc);
             }
+            try {
+                TFileFormatType formatType = 
HMSExternalTable.getTFileFormatType(
+                        hiveSplit.getInputFormat(), hiveSplit.getSerde(), 
sessionVariable,
+                        hmsTable.firstColumnIsString(), hmsTable.getName());
+                if (formatType != null) {
+                    rangeDesc.setFormatType(formatType);
+                }
+            } catch (UserException e) {
+                LOG.warn("Failed to get file format type for split: {}", 
split, e);

Review Comment:
   The error message should include more context about the implications of this 
failure. Consider adding information about whether the query will fall back to 
table-level format or fail, e.g., 'Failed to get file format type for split: 
{}, falling back to table-level format' or including the inputFormat/serde that 
caused the failure.
   ```suggestion
                   LOG.warn("Failed to get file format type for split: {} 
(inputFormat: {}, serde: {}), falling back to table-level format", split, 
hiveSplit.getInputFormat(), hiveSplit.getSerde(), e);
   ```



##########
regression-test/data/external_table_p0/hive/test_mixed_format_partitions.out:
##########
@@ -0,0 +1,8 @@
+-- This file is automatically generated. You should know what you did if you 
want to edit this
+-- !mixed_format
+1      orc_row 2023-01-01
+2      parquet_row     2023-01-02
+
+-- !mixed_format

Review Comment:
   The query tag name 'mixed_format' should be 'mixed_format_partitions' to 
match the test name and be more descriptive, or the test should use the tag 
'mixed_format' consistently.
   ```suggestion
   -- !mixed_format_partitions
   1    orc_row 2023-01-01
   2    parquet_row     2023-01-02
   
   -- !mixed_format_partitions
   ```



##########
fe/fe-core/src/main/java/org/apache/doris/datasource/hive/HMSExternalTable.java:
##########
@@ -1206,27 +1206,32 @@ public List<SysTable> getSupportedSysTables() {
     }
 
     public TFileFormatType getFileFormatType(SessionVariable sessionVariable) 
throws UserException {
-        TFileFormatType type = null;
         Table table = getRemoteTable();
         String inputFormatName = table.getSd().getInputFormat();
+        String serDeLib = table.getSd().getSerdeInfo().getSerializationLib();
+        return getTFileFormatType(inputFormatName, serDeLib, sessionVariable, 
firstColumnIsString(), getName());
+    }
+

Review Comment:
   This new public static method should have a Javadoc comment explaining its 
purpose, parameters, return value, and when it throws UserException. This is 
especially important since it's extracted for reuse and represents a key piece 
of format resolution logic.
   ```suggestion
   
       /**
        * Resolves the file format type for a Hive table based on its input 
format, SerDe library, session variables,
        * and whether the first column is a string.
        *
        * @param inputFormatName the name of the input format of the Hive table
        * @param serDeLib the SerDe library used by the Hive table
        * @param sessionVariable the session variables that may affect format 
resolution
        * @param firstColumnIsString true if the first column of the table is 
of string type, false otherwise
        * @param tableName the name of the Hive table (used for error messages)
        * @return the resolved {@link TFileFormatType} for the table
        * @throws UserException if the SerDe is unsupported or if session 
variables and table schema are incompatible
        */
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to