MrAladdin commented on issue #11178:
URL: https://github.com/apache/hudi/issues/11178#issuecomment-2111453760
@xushiyan I need your help to answer the question I replied to you above,
thank you.
2、I have a question: When using Spark Structured Streaming to write data,
the number of hfile files under .hoodie/metadata/record_index is twice the
amount set by .option("hoodie.metadata.record.index.min.filegroup.count",
"720"), but when using offline Spark DataFrame for batch data writing, each
submission will generate a corresponding number of hfile, leading to an
excessively large number of hfiles under record_index. What is the reason for
this, and how can we better control the number of hfile files under
.hoodie/metadata/record_index and what is the most reasonable setting for the
size of each hfile? Also, what are the specific parameter names involved?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]