Zouxxyy commented on code in PR #8955:
URL: https://github.com/apache/hudi/pull/8955#discussion_r1227968015
##########
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieColumnProjectionUtils.java:
##########
@@ -112,22 +112,21 @@ public static List<Pair<String,String>>
getIOColumnNameAndTypes(Configuration co
/**
* If schema contains timestamp columns, this method is used for
compatibility when there is no timestamp fields.
*
- * <p>We expect 3 cases to use parquet-avro reader {@link
org.apache.hudi.hadoop.avro.HoodieAvroParquetReader} to read timestamp column:
+ * <p>We expect 2 cases to use parquet-avro reader {@link
org.apache.hudi.hadoop.avro.HoodieAvroParquetReader} to read timestamp column:
*
* <ol>
* <li>Read columns contain timestamp type;</li>
* <li>Empty original columns;</li>
- * <li>Empty read columns but existing original columns contain timestamp
type.</li>
* </ol>
*/
public static boolean supportTimestamp(Configuration conf) {
List<String> readCols = Arrays.asList(getReadColumnNames(conf));
if (readCols.isEmpty()) {
- return getIOColumnTypes(conf).contains("timestamp");
+ return false;
}
List<String> names = getIOColumns(conf);
List<String> types = getIOColumnTypes(conf);
return types.isEmpty() || IntStream.range(0, names.size()).filter(i ->
readCols.contains(names.get(i)))
Review Comment:
@xicm Can you explain when `types.isEmpty()` will be true?
##########
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieColumnProjectionUtils.java:
##########
@@ -112,22 +112,21 @@ public static List<Pair<String,String>>
getIOColumnNameAndTypes(Configuration co
/**
* If schema contains timestamp columns, this method is used for
compatibility when there is no timestamp fields.
*
- * <p>We expect 3 cases to use parquet-avro reader {@link
org.apache.hudi.hadoop.avro.HoodieAvroParquetReader} to read timestamp column:
+ * <p>We expect 2 cases to use parquet-avro reader {@link
org.apache.hudi.hadoop.avro.HoodieAvroParquetReader} to read timestamp column:
*
* <ol>
* <li>Read columns contain timestamp type;</li>
* <li>Empty original columns;</li>
- * <li>Empty read columns but existing original columns contain timestamp
type.</li>
* </ol>
*/
public static boolean supportTimestamp(Configuration conf) {
List<String> readCols = Arrays.asList(getReadColumnNames(conf));
if (readCols.isEmpty()) {
- return getIOColumnTypes(conf).contains("timestamp");
+ return false;
}
List<String> names = getIOColumns(conf);
List<String> types = getIOColumnTypes(conf);
return types.isEmpty() || IntStream.range(0, names.size()).filter(i ->
readCols.contains(names.get(i)))
- .anyMatch(i -> types.get(i).equals("timestamp"));
+ .anyMatch(i -> types.get(i).contains("timestamp"));
Review Comment:
Here I just use a simple way to judge (`contains("timestamp")`)
Field:
```text
ts1 array<timestamp>,
ts2 map<string, timestamp>,
ts3 struct<province:timestamp, city:string>
```
type(string) will be
```
array<timestamp>
map<string,timestamp>
struct<province:timestamp,city:string>
```
##########
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieColumnProjectionUtils.java:
##########
@@ -112,22 +112,21 @@ public static List<Pair<String,String>>
getIOColumnNameAndTypes(Configuration co
/**
* If schema contains timestamp columns, this method is used for
compatibility when there is no timestamp fields.
*
- * <p>We expect 3 cases to use parquet-avro reader {@link
org.apache.hudi.hadoop.avro.HoodieAvroParquetReader} to read timestamp column:
+ * <p>We expect 2 cases to use parquet-avro reader {@link
org.apache.hudi.hadoop.avro.HoodieAvroParquetReader} to read timestamp column:
*
* <ol>
* <li>Read columns contain timestamp type;</li>
* <li>Empty original columns;</li>
- * <li>Empty read columns but existing original columns contain timestamp
type.</li>
* </ol>
*/
public static boolean supportTimestamp(Configuration conf) {
List<String> readCols = Arrays.asList(getReadColumnNames(conf));
if (readCols.isEmpty()) {
- return getIOColumnTypes(conf).contains("timestamp");
Review Comment:
@xicm Here I think it should return false directly, what do you think.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]