Hi thank you all for the discussion today! There are questions around whether this localization is *sufficient for most data localization requirements*. - The proposed solution here does not localize stats in the metadata, and in the PII columns some data will be exposed. It was suggested there are ways that we can turn off these stats for specific columns. - Whether centralized computation, as assumed in this proposal, is ever acceptable. If not, this proposal might not be of value to address data localization. (Internally, we believe it is sufficient for at least one use case we are working on)
That brings us to *usefulness outside of data localization.* - One suggestion sees this data localization as a possible solution to lifecycle data management, partition value can suggest age of data to be written to different storage systems with cost and latency profiles. - Others express that lifecycle policy management could be done from S3 itself, or do a complete rewrite. There is helpful *feedback on implementations* - Whether we are leaking semantic meaning of "country"/"locality" into the location provider. - One suggestion is that this location provider can be customizable enough that we can leave these business logic here complete to users, instead of constraining it to simple string look-up as done in the proposed solution. I'm happy to take more input from folks. One line of useful discussion would be: if this is going to be off-the-box abstraction, - how customizable do we want it to be - what are use cases that this custom data location based on partitioning would be helpful--besides data localization and lifecycle management Also I'm also happy to discuss how folks are solving specific problems of data localization under different regulatory requirements. Best, Mick Jermsurawong On Mon, Sep 28, 2020 at 7:03 PM Mick Jermsurawong < mickjermsuraw...@stripe.com> wrote: > Hi Iceberg community, > > We are solving data localization following legal requirements to store > data in designated physical areas. We think that Iceberg can neatly solve > this problem with the existing interfaces. Here's the current proposal > <https://docs.google.com/document/d/1ZluOiRZlmsfNnQJLSqTiBQg7-XeSE-gvEOn2e0y6E54/edit#heading=h.cuifpdpzmfqz> > explaining > our motivation and approach. > > We would appreciate input for the followings: > - if there are already similar on-going features from the community > - any non-iceberg approaches that have been considered to solve data > localization > - how much would this feature in our private fork be compatible with > future directions > - general feedback on the proposed solution > > Thank you in advance for any feedback here! > > Best, > Mick Jermsurawong >