Hi, Right now, archive folder is just that.. an archive.. It's there so that you can use the CLI and trace audit history if needed.. Hudi operations all use the active timeline only (i.e the files you see on .hoodie directly) and an archival process keeps that bounded so we can keep scaling with data growth. Hope that makes sense
On Thu, May 21, 2020 at 8:14 AM tanujdua <[email protected]> wrote: > Hi, > I am evaluating Hudi with frequent reads than write so COW suits us. > What I understood is that we put it in archive folder but as we will move > on (say after 10 years) we will have very large number of commit files so > how HUDI will be able to handle that. > > PS: We will be using S3 storage for our purpose. >
