yihua commented on a change in pull request #3893:
URL: https://github.com/apache/hudi/pull/3893#discussion_r795967769
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bloom/HoodieBloomIndex.java
##########
@@ -218,10 +216,10 @@ public boolean isImplicitWithStorage() {
/**
* Tag the <rowKey, filename> back to the original HoodieRecord List.
*/
- protected HoodieData<HoodieRecord<T>> tagLocationBacktoRecords(
+ protected <R> HoodieData<HoodieRecord<R>> tagLocationBacktoRecords(
HoodiePairData<HoodieKey, HoodieRecordLocation> keyFilenamePair,
- HoodieData<HoodieRecord<T>> records) {
- HoodiePairData<HoodieKey, HoodieRecord<T>> keyRecordPairs =
+ HoodieData<HoodieRecord<R>> records) {
+ HoodiePairData<HoodieKey, HoodieRecord<?>> keyRecordPairs =
Review comment:
should this be `HoodiePairData<HoodieKey, HoodieRecord<R>>`?
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieSortedMergeHandle.java
##########
@@ -85,7 +86,7 @@ public void write(GenericRecord oldRecord) {
}
// This is a new insert
- HoodieRecord<T> hoodieRecord = new
HoodieRecord<>(keyToNewRecords.get(keyToPreWrite));
+ HoodieRecord<T> hoodieRecord = new
HoodieAvroRecord<>(keyToNewRecords.get(keyToPreWrite));
Review comment:
Similar here for using `.newInstance()`?
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java
##########
@@ -390,7 +391,7 @@ public boolean canWrite(HoodieRecord record) {
@Override
public void write(HoodieRecord record, Option<IndexedRecord> insertValue) {
- Option<Map<String, String>> recordMetadata =
record.getData().getMetadata();
+ Option<Map<String, String>> recordMetadata = ((HoodieAvroRecord)
record).getData().getMetadata();
Review comment:
should be `((HoodieRecordPayload) record.getData()).getMetadata();`?
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java
##########
@@ -324,7 +325,7 @@ public void write(GenericRecord oldRecord) {
if (keyToNewRecords.containsKey(key)) {
// If we have duplicate records that we are updating, then the hoodie
record will be deflated after
// writing the first record. So make a copy of the record to be merged
- HoodieRecord<T> hoodieRecord = new
HoodieRecord<>(keyToNewRecords.get(key));
+ HoodieRecord<T> hoodieRecord = new
HoodieAvroRecord<T>(keyToNewRecords.get(key));
Review comment:
Can this be replaced by `keyToNewRecords.get(key).newInstance()`?
##########
File path:
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/index/TestHoodieIndexConfigs.java
##########
@@ -132,47 +115,6 @@ public void testCreateIndexWithException() {
assertTrue(thrown2.getMessage().contains("Unable to instantiate class"));
}
- public static class DummyHoodieIndex<T extends HoodieRecordPayload<T>>
extends SparkHoodieIndex<T> {
Review comment:
Maybe rewrite this for new APIs?
##########
File path:
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/index/FlinkHoodieIndex.java
##########
@@ -48,21 +47,22 @@ protected FlinkHoodieIndex(HoodieWriteConfig config) {
@PublicAPIMethod(maturity = ApiMaturityLevel.DEPRECATED)
public abstract List<WriteStatus> updateLocation(List<WriteStatus>
writeStatuses,
HoodieEngineContext context,
- HoodieTable<T,
List<HoodieRecord<T>>, List<HoodieKey>, List<WriteStatus>> hoodieTable) throws
HoodieIndexException;
+ HoodieTable hoodieTable)
throws HoodieIndexException;
Review comment:
Similar for engine-specific HoodieIndex classes to remove deprecated API
methods altogether (in a separate PR).
##########
File path:
hudi-client/hudi-flink-client/src/test/java/org/apache/hudi/testutils/HoodieFlinkWriteableTestTable.java
##########
@@ -130,7 +131,7 @@ public HoodieFlinkWriteableTestTable withInserts(String
partition, String fileId
header.put(HeaderMetadataType.SCHEMA, schema.toString());
logWriter.appendBlock(new
HoodieAvroDataBlock(groupedRecords.stream().map(r -> {
try {
- GenericRecord val = (GenericRecord)
r.getData().getInsertValue(schema).get();
+ GenericRecord val = (GenericRecord) ((HoodieAvroRecord)
r).getData().getInsertValue(schema).get();
Review comment:
Use `HoodieRecordPayload` here?
##########
File path:
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieAvroRecord.java
##########
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+public class HoodieAvroRecord<T extends HoodieRecordPayload> extends
HoodieRecord<T> {
Review comment:
As discussed, this is more of an intermediate solution for row writer,
before RFC-46 revamps it completely.
##########
File path:
hudi-client/hudi-client-common/src/test/java/org/apache/hudi/testutils/HoodieWriteableTestTable.java
##########
@@ -175,7 +177,7 @@ public HoodieWriteableTestTable withInserts(String
partition, String fileId, Lis
header.put(HoodieLogBlock.HeaderMetadataType.SCHEMA, schema.toString());
logWriter.appendBlock(new
HoodieAvroDataBlock(groupedRecords.stream().map(r -> {
try {
- GenericRecord val = (GenericRecord)
r.getData().getInsertValue(schema).get();
+ GenericRecord val = (GenericRecord) ((HoodieAvroRecord)
r).getData().getInsertValue(schema).get();
Review comment:
similar here.
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/simple/HoodieGlobalSimpleIndex.java
##########
@@ -124,29 +123,29 @@ public HoodieGlobalSimpleIndex(HoodieWriteConfig config,
Option<BaseKeyGenerator
return incomingRecords.leftOuterJoin(existingRecordByRecordKey).values()
.flatMap(entry -> {
- HoodieRecord<T> inputRecord = entry.getLeft();
+ HoodieRecord<R> inputRecord = entry.getLeft();
Option<Pair<String, HoodieRecordLocation>> partitionPathLocationPair
= Option.ofNullable(entry.getRight().orElse(null));
- List<HoodieRecord<T>> taggedRecords;
+ List<HoodieRecord<R>> taggedRecords;
if (partitionPathLocationPair.isPresent()) {
String partitionPath = partitionPathLocationPair.get().getKey();
HoodieRecordLocation location =
partitionPathLocationPair.get().getRight();
if (config.getGlobalSimpleIndexUpdatePartitionPath() &&
!(inputRecord.getPartitionPath().equals(partitionPath))) {
// Create an empty record to delete the record in the old
partition
- HoodieRecord<T> deleteRecord = new HoodieRecord(new
HoodieKey(inputRecord.getRecordKey(), partitionPath), new
EmptyHoodieRecordPayload());
+ HoodieRecord<R> deleteRecord = new HoodieAvroRecord(new
HoodieKey(inputRecord.getRecordKey(), partitionPath), new
EmptyHoodieRecordPayload());
Review comment:
Similar here for revisiting later on.
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/bootstrap/BootstrapRecordConsumer.java
##########
@@ -39,7 +40,8 @@ public BootstrapRecordConsumer(HoodieBootstrapHandle
bootstrapHandle) {
@Override
protected void consumeOneRecord(HoodieRecord record) {
try {
- bootstrapHandle.write(record,
record.getData().getInsertValue(bootstrapHandle.getWriterSchemaWithMetaFields()));
+ bootstrapHandle.write(record, ((HoodieAvroRecord) record).getData()
Review comment:
use `((HoodieRecordPayload) record.getData())` as well?
##########
File path:
hudi-client/hudi-client-common/src/test/java/org/apache/hudi/testutils/HoodieWriteableTestTable.java
##########
@@ -141,7 +143,7 @@ public HoodieWriteableTestTable withInserts(String
partition, String fileId, Lis
config, schema, contextSupplier)) {
int seqId = 1;
for (HoodieRecord record : records) {
- GenericRecord avroRecord = (GenericRecord)
record.getData().getInsertValue(schema).get();
+ GenericRecord avroRecord = (GenericRecord) ((HoodieAvroRecord)
record).getData().getInsertValue(schema).get();
Review comment:
use `((HoodieRecordPayload) record.getData())`?
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bloom/HoodieGlobalBloomIndex.java
##########
@@ -109,29 +110,29 @@ public HoodieGlobalBloomIndex(HoodieWriteConfig config,
BaseHoodieBloomIndexHelp
// Here as the records might have more data than rowKeys (some rowKeys'
fileId is null), so we do left outer join.
return
incomingRowKeyRecordPairs.leftOuterJoin(existingRecordKeyToRecordLocationHoodieKeyMap).values().flatMap(record
-> {
- final HoodieRecord<T> hoodieRecord = record.getLeft();
+ final HoodieRecord<R> hoodieRecord = record.getLeft();
final Option<Pair<HoodieRecordLocation, HoodieKey>>
recordLocationHoodieKeyPair = record.getRight();
if (recordLocationHoodieKeyPair.isPresent()) {
// Record key matched to file
if (config.getBloomIndexUpdatePartitionPath()
&&
!recordLocationHoodieKeyPair.get().getRight().getPartitionPath().equals(hoodieRecord.getPartitionPath()))
{
// Create an empty record to delete the record in the old partition
- HoodieRecord<T> deleteRecord = new
HoodieRecord(recordLocationHoodieKeyPair.get().getRight(),
+ HoodieRecord<R> deleteRecord = new
HoodieAvroRecord(recordLocationHoodieKeyPair.get().getRight(),
Review comment:
This needs to be revisited to make it Avro agnostic later on.
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndex.java
##########
@@ -60,7 +57,7 @@ protected HoodieIndex(HoodieWriteConfig config) {
@Deprecated
@PublicAPIMethod(maturity = ApiMaturityLevel.DEPRECATED)
public I tagLocation(I records, HoodieEngineContext context,
- HoodieTable<T, I, K, O> hoodieTable) throws
HoodieIndexException {
+ HoodieTable hoodieTable) throws HoodieIndexException {
Review comment:
@xushiyan @alexeykudinkin Since this is anyway backward incompatible,
should we just remove the deprecated public API methods and get rid of `I` and
`O` as well? The reason to keep these methods and the generics is to adapt for
users extending these APIs. If you want to change the generics, I'd prefer
that all such generics changes in relation to all public APIs should get in
0.11.0 release in one shot.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]