kywe665 commented on a change in pull request #4076:
URL: https://github.com/apache/hudi/pull/4076#discussion_r755695067
##########
File path: website/docs/file_sizing.md
##########
@@ -0,0 +1,53 @@
+---
+title: "Auto File Size Management"
+toc: true
+---
+
+This doc will show you how Apache Hudi overcomes the dreaded small files
problem. A key design decision in Hudi was to
+avoid creating small files in the first place and always write properly sized
files.
+There are 2 ways to manage small files in Hudi and below will describe the
advantages and trade-offs of each.
+
+## Auto-Size During ingestion
+
+You can automatically manage size of files during ingestion. This solution
adds a little latency during ingestion, but
+it ensures that read queries are always efficient as soon as a write is
committed. If you don't
+manage file sizing as you write and instead try to periodically run a
file-sizing clean-up, your queries will be slow until that resize cleanup is
periodically performed.
+
+(Note: [bulk_insert](/docs/next/write_operations) write operation does not
provide auto-sizing during ingestion)
Review comment:
write_operations is a net new doc not found in v0.9.0. Build will break
if I point to /docs/write_operations. Since there are so many net new docs, I
suggest we leave links as /docs/next/ for now and then after we cut 0.10.0
release this week we can sweep through to remove the /next/?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]