github-actions[bot] commented on code in PR #64071:
URL: https://github.com/apache/doris/pull/64071#discussion_r3374434556
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/load/NereidsLoadScanProvider.java:
##########
@@ -195,9 +196,16 @@ private void
fillContextExprMap(List<NereidsImportColumnDesc> columnDescList, Ne
// If user does not specify the file field names, generate it by using
base schema of table.
// So that the following process can be unified
boolean specifyFileFieldNames = copiedColumnExprs.stream().anyMatch(p
-> p.isColumn());
- if (!specifyFileFieldNames) {
+ if (!specifyFileFieldNames || isFillMissingColumns(fileGroup)) {
Review Comment:
For routine load this condition will still be false even when the job
property was set. `KafkaRoutineLoadJob.toNereidsRoutineLoadTaskInfo()` copies
`jobProperties` into `NereidsRoutineLoadTaskInfo`, but
`NereidsDataDescription(NereidsLoadTaskInfo)` only copies JSON props like
`strip_outer_array`, `jsonpaths`, `json_root`, `fuzzy_parse`,
`read_json_by_line`, and `num_as_string` into `analysisMap`. It never copies
`fill_missing_columns`, so `analyzeFileFormatProperties()` builds a
`JsonFileFormatProperties` with the default `false`, and this new branch is not
taken for the routine-load execution path. Please add the property to the
Nereids task-info/data-description propagation path and cover it with a test
that reaches `NereidsLoadScanProvider`.
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/load/NereidsLoadScanProvider.java:
##########
@@ -420,15 +428,37 @@ private void
fillContextExprMap(List<NereidsImportColumnDesc> columnDescList, Ne
}
/**
- * if not set sequence column and column size is null or only have deleted
sign ,return true
+ * Returns true when the sequence column should be auto-added, i.e.,
+ * if not set sequence column and column size is null or only have deleted
sign,
+ * or fill_missing_columns is enabled, meaning schema will be auto-filled.
*/
- private boolean shouldAddSequenceColumn(List<NereidsImportColumnDesc>
columnDescList) {
+ private boolean shouldAddSequenceColumn(List<NereidsImportColumnDesc>
columnDescList,
+ NereidsBrokerFileGroup fileGroup) {
+ if (isFillMissingColumns(fileGroup)) {
+ return true;
+ }
if (columnDescList.isEmpty()) {
return true;
}
return columnDescList.size() == 1 &&
columnDescList.get(0).getColumnName().equalsIgnoreCase(Column.DELETE_SIGN);
}
+ /**
+ * Returns true if the file format is JSON and fill_missing_columns is
enabled. Only meaningful for JSON.
+ */
+ private boolean isFillMissingColumns(NereidsBrokerFileGroup fileGroup) {
+ return fileGroup.getFileFormatProperties() instanceof
JsonFileFormatProperties
+ && ((JsonFileFormatProperties)
fileGroup.getFileFormatProperties()).isFillMissingColumns();
+ }
+
+ /**
+ * Returns true if the file format is JSON and fill_missing_columns is
enabled. Only meaningful for JSON.
+ */
+ private boolean isFillMissingColumns(NereidsBrokerFileGroup fileGroup) {
Review Comment:
This duplicates the `isFillMissingColumns(NereidsBrokerFileGroup)` method
declared just above at lines 449-452, so `NereidsLoadScanProvider` will not
compile (`method isFillMissingColumns(...) is already defined`). Please remove
one of the duplicate declarations before this can pass FE compilation.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]