DavidZ1 commented on issue #8071:
URL: https://github.com/apache/hudi/issues/8071#issuecomment-1453167727

   We switched the encoding format and started to use the lz4 format to 
compress, but found that the platform does not support it, and the exception is 
as follows:
   `
   java.io.IOException: java.io.IOException: Exception happened when bulk 
insert.
        at 
org.apache.hudi.sink.bulk.BulkInsertWriterHelper.write(BulkInsertWriterHelper.java:117)
        at 
org.apache.hudi.sink.append.AppendWriteFunction.processElement(AppendWriteFunction.java:86)
        at 
org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66)
        at 
org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:205)
        at 
org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.processElement(AbstractStreamTaskNetworkInput.java:134)
        at 
org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:105)
        at 
org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:66)
        at 
org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:423)
        at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:204)
        at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:684)
        at 
org.apache.flink.streaming.runtime.tasks.StreamTask.executeInvoke(StreamTask.java:639)
        at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:650)
        at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:623)
        at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:779)
        at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566)
        at java.lang.Thread.run(Thread.java:750)
   Caused by: java.io.IOException: Exception happened when bulk insert.
        at 
org.apache.hudi.sink.bulk.BulkInsertWriterHelper.write(BulkInsertWriterHelper.java:115)
        ... 15 more
   Caused by: java.lang.RuntimeException: native lz4 library not available
        at 
org.apache.hadoop.io.compress.Lz4Codec.getCompressorType(Lz4Codec.java:125)
        at 
org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:150)
        at 
org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:168)
        at 
org.apache.parquet.hadoop.CodecFactory$HeapBytesCompressor.<init>(CodecFactory.java:146)
        at 
org.apache.parquet.hadoop.CodecFactory.createCompressor(CodecFactory.java:208)
        at 
org.apache.parquet.hadoop.CodecFactory.getCompressor(CodecFactory.java:191)
        at 
org.apache.parquet.hadoop.ParquetWriter.<init>(ParquetWriter.java:296)
        at 
org.apache.parquet.hadoop.ParquetWriter.<init>(ParquetWriter.java:228)
        at 
org.apache.hudi.io.storage.row.HoodieRowDataParquetWriter.<init>(HoodieRowDataParquetWriter.java:45)
        at 
org.apache.hudi.io.storage.row.HoodieRowDataFileWriterFactory.newParquetInternalRowFileWriter(HoodieRowDataFileWriterFactory.java:79)
        at 
org.apache.hudi.io.storage.row.HoodieRowDataFileWriterFactory.getRowDataFileWriter(HoodieRowDataFileWriterFactory.java:55)
        at 
org.apache.hudi.io.storage.row.HoodieRowDataCreateHandle.createNewFileWriter(HoodieRowDataCreateHandle.java:211)
        at 
org.apache.hudi.io.storage.row.HoodieRowDataCreateHandle.<init>(HoodieRowDataCreateHandle.java:103)
        at 
org.apache.hudi.sink.bulk.BulkInsertWriterHelper.getRowCreateHandle(BulkInsertWriterHelper.java:134)
        at 
org.apache.hudi.sink.bulk.BulkInsertWriterHelper.write(BulkInsertWriterHelper.java:110)
        ... 15 more
   `
   
   
   Then we switched to the snappy format, and the writing performance did 
improve to a certain extent. However, due to the Tencent Cloud COS we used for 
storage, there was a list frequency control problem in cow writing, so the 
overall performance could not be greatly improved,and the exception is as 
follows:
   
   `org.apache.hudi.exception.HoodieException: Timeout(601000ms) while waiting 
for instant initialize
        at org.apache.hudi.sink.utils.TimeWait.waitFor(TimeWait.java:57)
        at 
org.apache.hudi.sink.common.AbstractStreamWriteFunction.instantToWrite(AbstractStreamWriteFunction.java:276)
        at 
org.apache.hudi.sink.append.AppendWriteFunction.initWriterHelper(AppendWriteFunction.java:110)
        at 
org.apache.hudi.sink.append.AppendWriteFunction.processElement(AppendWriteFunction.java:84)
        at 
org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66)
        at 
org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:71)
        at 
org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:46)
        at 
org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:26)
        at 
org.apache.flink.streaming.runtime.tasks.SourceOperatorStreamTask$AsyncDataOutputToOutput.emitRecord(SourceOperatorStreamTask.java:188)
        at 
org.apache.flink.streaming.api.operators.source.SourceOutputWithWatermarks.collect(SourceOutputWithWatermarks.java:110)
        at 
org.apache.flink.connector.kafka.source.reader.KafkaRecordEmitter.emitRecord(KafkaRecordEmitter.java:36)
        at 
org.apache.flink.connector.kafka.source.reader.KafkaRecordEmitter.emitRecord(KafkaRecordEmitter.java:27)
        at 
org.apache.flink.connector.base.source.reader.SourceReaderBase.pollNext(SourceReaderBase.java:128)
        at 
org.apache.flink.streaming.api.operators.SourceOperator.emitNext(SourceOperator.java:305)
        at 
org.apache.flink.streaming.runtime.io.StreamTaskSourceInput.emitNext(StreamTaskSourceInput.java:69)
        at 
org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:66)
        at 
org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:423)
        at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:204)
        at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:684)
        at 
org.apache.flink.streaming.runtime.tasks.StreamTask.executeInvoke(StreamTask.java:639)
        at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:650)
        at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:623)
        at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:779)
        at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566)
        at java.lang.Thread.run(Thread.java:750)
   `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to