Well the question/use case has to deal with legacy with Hive Meta Store. Since Hive Meta store only tracked partitions and not individual files, use cases are built around that assumption. In this case, we have an important house-keeping flow which replaces existing files with files of the same name (After massaging data inside these files).
This brings me to another fundamental question dealing with legacy of HMS. While I understand it is clever and beneficial too in some cases to hide partition info, losing that hierarchical abstraction (Between partition -> files) is huge change with wide reaching impact for existing users of HMS; and arguably one that may not always bear fruit. I would have liked to see this hierarchical abstraction retained. It would have made migration much easier for existing HMS users, while retaining a useful abstraction. Have other people on this forum experienced similar pain? Arvind From: Ryan Blue <rb...@netflix.com> Reply-To: "rb...@netflix.com" <rb...@netflix.com> Date: Tuesday, February 26, 2019 at 9:54 AM To: Jacques Nadeau <jacq...@dremio.com> Cc: Iceberg Dev List <dev@iceberg.apache.org>, Arvind Pruthi <apru...@linkedin.com> Subject: Re: Question about replacing files and about Publishing Jars You could always embed version information in the file location, like S3's @<version> syntax. That's just another way to make it unique. Why is it necessary to overwrite the original file location though? That's why I don't think I understand the use case. On Tue, Feb 26, 2019 at 9:50 AM Jacques Nadeau <jacq...@dremio.com<mailto:jacq...@dremio.com>> wrote: We're using etag for better clarity on this at Dremio (for a different use case). I wonder if the same thing should be available in iceberg. -- Jacques Nadeau CTO and Co-Founder, Dremio On Tue, Feb 26, 2019 at 9:48 AM Ryan Blue <rb...@netflix.com.invalid> wrote: Hi Arvind, Iceberg assumes that all file locations are unique. If two snapshots refer to the same location, then whatever data file (or version) is in that location is what is read. What is your use case? Apache Iceberg has no official releases yet. We still need to do some license work for binaries, get the build set up for Apache publication, finish a few more PRs, and rename packages. In the mean time, you can use JitPack to build binaries for specific commits. That should allow you to easily test the project if you don't want to build it yourself. On Mon, Feb 25, 2019 at 6:27 PM Arvind Pruthi <apru...@linkedin.com<mailto:apru...@linkedin.com>> wrote: Hello There, Q1. What happens In case a file is deleted and a new file is to be added with the same name, but the snapshot in which the delete was registered is still around? There is no ambiguity from listing the manifest entries point of view. However, there will be ambiguity at the Hdfs level. How is that resolved? Also any thoughts on if a file needs to be replaced with a different file with the same name (We have a use case for this)? Q2. Are the iceberg jars being published anywhere? I couldn’t find them in maven central. Thanks, Arvind -- Ryan Blue Software Engineer Netflix -- Ryan Blue Software Engineer Netflix