alexeykudinkin commented on a change in pull request #4067:
URL: https://github.com/apache/hudi/pull/4067#discussion_r770878288
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieCreateHandle.java
##########
@@ -98,8 +98,14 @@ public HoodieCreateHandle(HoodieWriteConfig config, String
instantTime, HoodieTa
new Path(config.getBasePath()),
FSUtils.getPartitionPath(config.getBasePath(), partitionPath));
partitionMetadata.trySave(getPartitionId());
createMarkerFile(partitionPath,
FSUtils.makeDataFileName(this.instantTime, this.writeToken, this.fileId,
hoodieTable.getBaseFileExtension()));
+
+ Option<String> recordKeyField =
Option.ofNullable(hoodieTable.getMetaClient().getTableConfig().getRecordKeyFieldProp());
+ Option<Schema.Field> recordKeySchemaFieldID = Option.empty();
Review comment:
We can do `opt.map(keyField -> ...)` instead of conditional
P.S. On a nit side, i'd suggest to generally avoid overwriting vars as much
as possible and treat them as vals -- immutability is a very powerful thing in
allowing to see clearly what the flow is and (more importantly) what it's NOT,
making it much easier to understand the control-flow. I see it quite often in
the Hudi codebase that vars are overwritten why they essentially didn't need to
be and that makes it harder to evaluate all possible permutations of the
control-flow since it's usually contingent on it.
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
##########
@@ -1050,6 +1051,13 @@ public boolean populateMetaFields() {
HoodieTableConfig.POPULATE_META_FIELDS.defaultValue()));
}
+ public boolean shouldMetadataExcludeKeyFromPayload() {
+ if (getBasePath() == null ||
!HoodieTableMetadata.isMetadataTable(getBasePath())) {
Review comment:
Should we assert instead that this is only invoked on the Metadata table?
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java
##########
@@ -182,9 +185,14 @@ private void init(String fileId, String partitionPath,
HoodieBaseFile baseFileTo
// Create Marker file
createMarkerFile(partitionPath, newFileName);
+ Option<Schema.Field> recordKeySchemaFieldID = Option.empty();
Review comment:
Same comment as above
##########
File path:
hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieHFileKeyExcludedReader.java
##########
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.io.storage;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.IndexedRecord;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hbase.io.hfile.CacheConfig;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+
+/**
+ * HFile reader for the Metadata table log files. Metadata table records in the
+ * HFile data blocks have the redundant key field in the record payload
trimmed.
+ * So, when the log reader is reading records, materialization of such trimmed
+ * records must be done before handing the records to the callers. This class
+ * takes care of Metadata table record materialization, any needed.
+ *
+ * @param <R> Metadata table record type.
+ */
+public class HoodieHFileKeyExcludedReader<R extends IndexedRecord> extends
HoodieHFileReader<R> {
Review comment:
Why not making this a feature of `HoodieHFileReader`/`Writer` itself?
Just passing a flag whether we would like to keep keys w/in records payload or
strip them away.
I think we should keep the whole flow of stripping/materializing the keys
from/into records under one roof: most of the changes are happening w/in
`HoodieHFileReader`/`Writer` to handle stripping/materializing anyway so just
keeping all this logic consolidated in one place seems to be reasonable
approach to me.
##########
File path:
hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieLogBlockFactory.java
##########
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.table.log.block;
+
+import org.apache.avro.generic.IndexedRecord;
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.exception.HoodieException;
+import org.apache.hudi.metadata.HoodieMetadataPayload;
+
+import java.util.List;
+import java.util.Map;
+
+public class HoodieLogBlockFactory {
+
+ /**
+ * Util method to get a data block for the requested type.
+ *
+ * @param logDataBlockFormat - Data block type
+ * @param recordList - List of records that goes in the data block
+ * @param header - data block header
+ * @param tableConfig - Table config
+ * @param excludeKeyFromRecord - Exclude key from record
+ * @param populateMetaFields - Whether to populate meta fields in the
record
+ * @return Data block of the requested type.
+ */
+ public static HoodieLogBlock getBlock(HoodieLogBlock.HoodieLogBlockType
logDataBlockFormat,
+ List<IndexedRecord> recordList,
+ Map<HoodieLogBlock.HeaderMetadataType,
String> header,
+ HoodieTableConfig tableConfig,
+ boolean populateMetaFields, boolean
excludeKeyFromRecord) {
+ String keyField;
+ if (populateMetaFields) {
+ keyField = (excludeKeyFromRecord
+ ? HoodieMetadataPayload.SCHEMA_FIELD_ID_KEY :
HoodieRecord.RECORD_KEY_METADATA_FIELD);
Review comment:
Current conditional is extremely confusing: why are we falling back to
`HoodieRecord.RECORD_KEY_METADATA_FIELD` if `excludeKeyFromRecord` is false? My
current understanding is that we're trying to differentiate the Metadata vs
Data table handling here, and the fact that we're mixing this 2 flows in a
single method makes it really confusing (which also ties back to my previous
comment on `shouldExclude` method, that we should assert instead of returning
false).
Let's either make conditional more pronounced trying to differentiate
Metadata table vs Data or, otherwise, make the differences "invisible" (ie just
passing different key fields based on the table type).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]