Thanks Dmitri for the thoughtful review!
If we rely on Iceberg manifest files to transactionally track files, it needs additional APIs or engines to manage file operations. In contrast, the approach in the proposal allows users to seamlessly add, delete, or update files without any new API dependencies. For example, users can still utilize existing tools like the AWS S3 CLI to add files directly. If we are talking about the idea of replacing the concept of a Directory Table (backed by a standard Iceberg table, as proposed) with manifest files, it doesn’t seem advantageous. A standard Iceberg table offers much broader compatibility and accessibility across various tools compared to manifest files, making it a more versatile and user-friendly choice. Also, adopting manifest files for this purpose would require Iceberg spec changes. While there are some informal discussions about this in the community, it’s unclear how or when such changes might materialize. Yufei On Fri, Dec 6, 2024 at 9:03 AM Dmitri Bourlatchkov <dmitri.bourlatch...@dremio.com.invalid> wrote: > Hi Yufei, > > Interesting proposal. I commented in the doc. > > WDYT about using Iceberg metadata to list the stage files (in manifests)? > > Thanks, > Dmitri. > > On Thu, Dec 5, 2024 at 6:21 PM Yufei Gu <flyrain...@gmail.com> wrote: > > > Hi Folks, > > > > Polaris has become a cornerstone for managing structured data across > > diverse processing engines, ensuring high performance and reliability. To > > further enhance its capabilities, we propose extending Polaris to support > > unstructured data. This will enable it to handle a broader range of data > > types efficiently, meeting the growing demands of AI/ML and other > > data-intensive applications. > > > > You can find the proposal here: Proposal: Unstructured Data Support in > > Polaris > > < > > > https://docs.google.com/document/d/1ofljkrtiXRWc-v6hfkg_laKlYltepTPX7zsg44Tb-BY/edit?usp=sharing > > > > > > > We welcome your feedback and insights. Please take a look and share your > > thoughts. At the same time, we will make a POC pretty soon to test the > idea > > and gather more feedback. > > > > > > Yufei > > >