yihua commented on code in PR #12866:
URL: https://github.com/apache/hudi/pull/12866#discussion_r2105287755
##########
hudi-hadoop-common/src/main/java/org/apache/hudi/common/util/HFileUtils.java:
##########
@@ -209,26 +198,26 @@ public ByteArrayOutputStream
serializeRecordsToLogBlock(HoodieStorage storage,
sortedRecordsMap.put(recordKey, recordBytes);
}
- HFile.Writer writer = HFile.getWriterFactory(conf, cacheConfig)
- .withOutputStream(ostream).withFileContext(context).create();
-
- // Write the records
- sortedRecordsMap.forEach((recordKey, recordBytes) -> {
- try {
- KeyValue kv = new KeyValue(recordKey.getBytes(), null, null,
recordBytes);
- writer.append(kv);
- } catch (IOException e) {
- throw new HoodieIOException("IOException serializing records", e);
- }
- });
-
- writer.appendFileInfo(
- getUTF8Bytes(HoodieAvroHFileReaderImplBase.SCHEMA_KEY),
getUTF8Bytes(readerSchema.toString()));
+ HFileContext context = HFileContext.builder()
+ .blockSize(DEFAULT_BLOCK_SIZE_FOR_LOG_FILE)
+ .compressionCodec(compressionCodec)
+ .build();
+ try(HFileWriter writer = new HFileWriterImpl(context, ostream)) {
+ sortedRecordsMap.forEach((recordKey,recordBytes)->
+ {
+ try {
+ writer.append(recordKey, recordBytes);
+ } catch (IOException e) {
+ throw new HoodieIOException("IOException serializing records", e);
+ }
+ });
+ writer.appendFileInfo(
+ HoodieAvroHFileReaderImplBase.SCHEMA_KEY,
+ getUTF8Bytes(readerSchema.toString()));
Review Comment:
nit: fix indentation
##########
hudi-hadoop-common/src/main/java/org/apache/hudi/common/util/HFileUtils.java:
##########
@@ -209,26 +198,26 @@ public ByteArrayOutputStream
serializeRecordsToLogBlock(HoodieStorage storage,
sortedRecordsMap.put(recordKey, recordBytes);
}
- HFile.Writer writer = HFile.getWriterFactory(conf, cacheConfig)
- .withOutputStream(ostream).withFileContext(context).create();
-
- // Write the records
- sortedRecordsMap.forEach((recordKey, recordBytes) -> {
- try {
- KeyValue kv = new KeyValue(recordKey.getBytes(), null, null,
recordBytes);
- writer.append(kv);
- } catch (IOException e) {
- throw new HoodieIOException("IOException serializing records", e);
- }
- });
-
- writer.appendFileInfo(
- getUTF8Bytes(HoodieAvroHFileReaderImplBase.SCHEMA_KEY),
getUTF8Bytes(readerSchema.toString()));
+ HFileContext context = HFileContext.builder()
+ .blockSize(DEFAULT_BLOCK_SIZE_FOR_LOG_FILE)
+ .compressionCodec(compressionCodec)
+ .build();
+ try(HFileWriter writer = new HFileWriterImpl(context, ostream)) {
Review Comment:
```suggestion
try (HFileWriter writer = new HFileWriterImpl(context, ostream)) {
```
##########
hudi-common/src/main/java/org/apache/hudi/common/bootstrap/index/hfile/HFileBootstrapIndexWriter.java:
##########
@@ -114,9 +108,7 @@ private void writeNextPartition(String partitionPath,
String bootstrapPartitionP
m.getBootstrapFileStatus())).collect(Collectors.toMap(Pair::getKey,
Pair::getValue)));
Option<byte[]> bytes =
TimelineMetadataUtils.serializeAvroMetadata(bootstrapPartitionMetadata,
HoodieBootstrapPartitionMetadata.class);
if (bytes.isPresent()) {
- indexByPartitionWriter
- .append(new KeyValue(getUTF8Bytes(getPartitionKey(partitionPath)),
new byte[0], new byte[0],
- HConstants.LATEST_TIMESTAMP, KeyValue.Type.Put, bytes.get()));
+ indexByPartitionWriter.append(getPartitionKey(partitionPath),
bytes.get());
Review Comment:
As long as the native HFile reader and writer work seamlessly together
without any correctness issue, we can land this PR. However, we need to
address the compatibility with HBase as a blocker for Hudi 1.1 release so that
the HBase-based HFile reader in Hudi 0.x can read the HFile written by the Hudi
1.1 release (for backwards compatibility).
##########
hudi-hadoop-common/src/main/java/org/apache/hudi/common/util/HFileUtils.java:
##########
@@ -75,12 +72,12 @@ public class HFileUtils extends FileFormatUtils {
* @param paramsMap parameter map containing the compression codec config.
* @return the {@link Compression.Algorithm} Enum.
*/
- public static Compression.Algorithm getHFileCompressionAlgorithm(Map<String,
String> paramsMap) {
- String algoName = paramsMap.get(HFILE_COMPRESSION_ALGORITHM_NAME.key());
- if (StringUtils.isNullOrEmpty(algoName)) {
- return Compression.Algorithm.GZ;
+ public static CompressionCodec getHFileCompressionAlgorithm(Map<String,
String> paramsMap) {
Review Comment:
Actually, we can remove `org.apache.hadoop.hbase.io.compress.Compression`
import by changing the docs of this method, and move the class to `hudi-common`
module in this PR?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]