nsivabalan commented on code in PR #10330:
URL: https://github.com/apache/hudi/pull/10330#discussion_r1465901347
##########
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieReaderConfig.java:
##########
@@ -29,6 +29,13 @@
groupName = ConfigGroups.Names.READER,
description = "Configurations that control file group reading.")
public class HoodieReaderConfig extends HoodieConfig {
+ public static final ConfigProperty<Boolean> USE_BUILT_IN_HFILE_READER =
ConfigProperty
+ .key("hoodie.hfile.use.built.in.reader")
Review Comment:
is this meant to be deprecated in near future and it not really expected to
be used by end user? then should we consider prefixing with "_"
##########
hudi-common/src/main/java/org/apache/hudi/common/bloom/HoodieDynamicBoundedBloomFilter.java:
##########
@@ -64,14 +66,17 @@ public class HoodieDynamicBoundedBloomFilter implements
BloomFilter {
public HoodieDynamicBoundedBloomFilter(String serString) {
// ignoring the type code for now, since we have just one version
byte[] bytes = Base64CodecUtil.decode(serString);
- DataInputStream dis = new DataInputStream(new ByteArrayInputStream(bytes));
- try {
- internalDynamicBloomFilter = new InternalDynamicBloomFilter();
- internalDynamicBloomFilter.readFields(dis);
- dis.close();
- } catch (IOException e) {
- throw new HoodieIndexException("Could not deserialize BloomFilter
instance", e);
- }
+ extractAndSetInternalBloomFilter(new DataInputStream(new
ByteArrayInputStream(bytes)));
Review Comment:
is it possible to do try with resource design here
##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieHFileDataBlock.java:
##########
@@ -211,9 +233,10 @@ protected <T> ClosableIterator<HoodieRecord<T>>
lookupRecords(List<String> sorte
blockContentLoc.getContentPositionInLogFile(),
blockContentLoc.getBlockSize());
- try (final HoodieAvroHFileReader reader =
- new HoodieAvroHFileReader(inlineConf, inlinePath, new
CacheConfig(inlineConf), inlinePath.getFileSystem(inlineConf),
- Option.of(getSchemaFromHeader()))) {
+ try (final BaseHoodieAvroHFileReader reader = (BaseHoodieAvroHFileReader)
Review Comment:
can we try to thinkg of better naming for HoodieAvroFileReaderBase and
BaseHoodieAvroHFileReader?
do you think we can rename BaseHoodieAvroHFileReader to
HoodieAvroHFileReaderImplBase or HoodieAvroHFileReaderBaseImpl
##########
hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieAvroHFileReader.java:
##########
@@ -728,42 +464,100 @@ public IndexedRecord next() {
@Override
public void close() {
try {
- scanner.close();
reader.close();
} catch (IOException e) {
- throw new HoodieIOException("Error closing the hfile reader and
scanner", e);
+ throw new HoodieIOException("Error closing the HFile reader and
scanner", e);
}
}
- }
- static class SeekableByteArrayInputStream extends
ByteBufferBackedInputStream implements Seekable, PositionedReadable {
- public SeekableByteArrayInputStream(byte[] buf) {
- super(buf);
- }
+ private static Iterator<IndexedRecord>
getRecordByKeyPrefixIteratorInternal(HFileReader reader,
Review Comment:
I see lot of code duplication across HoodieAvroHBaseHFileReader and
HoodieAvroHFileReader. for eg, RecordByKeyPrefixIterator, RecordByKeyIterator.
can we try to fix them and re-use code
##########
hudi-common/src/main/java/org/apache/hudi/common/bloom/SimpleBloomFilter.java:
##########
@@ -138,4 +144,12 @@ public BloomFilterTypeCode getBloomFilterTypeCode() {
return BloomFilterTypeCode.SIMPLE;
}
+ private void extractAndSetInternalBloomFilter(DataInputStream dis) {
+ try {
+ this.filter.readFields(dis);
+ dis.close();
Review Comment:
same comment as above
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java:
##########
@@ -182,7 +183,7 @@ public static List<Pair<String, Long>>
filterKeysFromFile(Path filePath, List<St
checkArgument(FSUtils.isBaseFile(filePath));
List<Pair<String, Long>> foundRecordKeys = new ArrayList<>();
try (HoodieFileReader fileReader =
HoodieFileReaderFactory.getReaderFactory(HoodieRecordType.AVRO)
- .getFileReader(configuration, filePath)) {
+ .getFileReader(new HoodieConfig(), configuration, filePath)) {
Review Comment:
if its an empty one always, should we declare a singleton instance and
re-use wherever required? you can name it DEFAULT_HUDI_CONFIG_FOR_READER
##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieHFileDataBlock.java:
##########
@@ -83,19 +89,29 @@ public HoodieHFileDataBlock(FSDataInputStream inputStream,
Map<HeaderMetadataType, String> header,
Map<HeaderMetadataType, String> footer,
boolean enablePointLookups,
- Path pathForReader) {
- super(content, inputStream, readBlockLazily,
Option.of(logBlockContentLocation), readerSchema, header, footer,
HoodieAvroHFileReader.KEY_FIELD_NAME, enablePointLookups);
+ Path pathForReader,
+ boolean useBuiltInHFileReader) {
+ super(content, inputStream, readBlockLazily,
Option.of(logBlockContentLocation), readerSchema,
+ header, footer, BaseHoodieAvroHFileReader.KEY_FIELD_NAME,
enablePointLookups);
this.compressionAlgorithm = Option.empty();
this.pathForReader = pathForReader;
+ this.hFileReaderConfig = new HoodieConfig();
+ this.hFileReaderConfig.setValue(
Review Comment:
should we create a private method for common instantiation code blocks and
re use across all constructors
##########
hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieAvroHBaseHFileReader.java:
##########
@@ -0,0 +1,669 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.io.storage;
+
+import org.apache.hudi.common.bloom.BloomFilter;
+import org.apache.hudi.common.bloom.BloomFilterFactory;
+import org.apache.hudi.common.fs.FSUtils;
+import org.apache.hudi.common.model.HoodieAvroIndexedRecord;
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.common.model.HoodieRecordLocation;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.VisibleForTesting;
+import org.apache.hudi.common.util.collection.ClosableIterator;
+import org.apache.hudi.common.util.collection.CloseableMappingIterator;
+import org.apache.hudi.common.util.collection.Pair;
+import org.apache.hudi.exception.HoodieException;
+import org.apache.hudi.exception.HoodieIOException;
+import org.apache.hudi.util.Lazy;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.IndexedRecord;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hbase.Cell;
+import org.apache.hadoop.hbase.KeyValue;
+import org.apache.hadoop.hbase.io.hfile.CacheConfig;
+import org.apache.hadoop.hbase.io.hfile.HFile;
+import org.apache.hadoop.hbase.io.hfile.HFileInfo;
+import org.apache.hadoop.hbase.io.hfile.HFileScanner;
+import org.apache.hadoop.hbase.nio.ByteBuff;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Objects;
+import java.util.Set;
+import java.util.SortedSet;
+import java.util.TreeSet;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.util.StringUtils.getUTF8Bytes;
+import static org.apache.hudi.common.util.TypeUtils.unsafeCast;
+
+/**
+ * NOTE: PLEASE READ DOCS & COMMENTS CAREFULLY BEFORE MAKING CHANGES
+ * <p>
+ * {@link HoodieFileReader} implementation allowing to read from {@link HFile}.
+ */
+public class HoodieAvroHBaseHFileReader extends BaseHoodieAvroHFileReader {
Review Comment:
I assume no changes to this class, just moving code around. can you confirm
that
##########
hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java:
##########
@@ -342,9 +343,10 @@ private MessageType readSchemaFromHFileBaseFile(Path
hFilePath) throws IOExcepti
LOG.info("Reading schema from " + hFilePath);
FileSystem fs = metaClient.getRawFs();
- CacheConfig cacheConfig = new CacheConfig(fs.getConf());
- try (HoodieAvroHFileReader hFileReader = new
HoodieAvroHFileReader(fs.getConf(), hFilePath, cacheConfig)) {
- return convertAvroSchemaToParquet(hFileReader.getSchema());
+ try (HoodieFileReader fileReader =
+
HoodieFileReaderFactory.getReaderFactory(HoodieRecord.HoodieRecordType.AVRO)
+ .getFileReader(new HoodieConfig(), fs.getConf(), hFilePath)) {
Review Comment:
lets use a singleton instance for Hoodieconfig
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]