The PR here takes a stab at a more general solution: dynamically loaded impl provided by user https://github.com/apache/iceberg/pull/1531
On Wed, Sep 30, 2020 at 10:54 AM Mick Jermsurawong < mickjermsuraw...@stripe.com> wrote: > Hi thank you all for the discussion today! > > There are questions around whether this localization is *sufficient for > most data localization requirements*. > - The proposed solution here does not localize stats in the metadata, and > in the PII columns some data will be exposed. It was suggested there are > ways that we can turn off these stats for specific columns. > - Whether centralized computation, as assumed in this proposal, is ever > acceptable. If not, this proposal might not be of value to address data > localization. (Internally, we believe it is sufficient for at least one use > case we are working on) > > That brings us to *usefulness outside of data localization.* > - One suggestion sees this data localization as a possible solution to > lifecycle data management, partition value can suggest age of data to be > written to different storage systems with cost and latency profiles. > - Others express that lifecycle policy management could be done from S3 > itself, or do a complete rewrite. > > There is helpful *feedback on implementations* > - Whether we are leaking semantic meaning of "country"/"locality" into the > location provider. > - One suggestion is that this location provider can be customizable enough > that we can leave these business logic here complete to users, instead of > constraining it to simple string look-up as done in the proposed solution. > > I'm happy to take more input from folks. One line of useful discussion > would be: if this is going to be off-the-box abstraction, > - how customizable do we want it to be > - what are use cases that this custom data location based on partitioning > would be helpful--besides data localization and lifecycle management > > Also I'm also happy to discuss how folks are solving specific problems of > data localization under different regulatory requirements. > > Best, > Mick Jermsurawong > > > On Mon, Sep 28, 2020 at 7:03 PM Mick Jermsurawong < > mickjermsuraw...@stripe.com> wrote: > >> Hi Iceberg community, >> >> We are solving data localization following legal requirements to store >> data in designated physical areas. We think that Iceberg can neatly solve >> this problem with the existing interfaces. Here's the current proposal >> <https://docs.google.com/document/d/1ZluOiRZlmsfNnQJLSqTiBQg7-XeSE-gvEOn2e0y6E54/edit#heading=h.cuifpdpzmfqz> >> explaining >> our motivation and approach. >> >> We would appreciate input for the followings: >> - if there are already similar on-going features from the community >> - any non-iceberg approaches that have been considered to solve data >> localization >> - how much would this feature in our private fork be compatible with >> future directions >> - general feedback on the proposed solution >> >> Thank you in advance for any feedback here! >> >> Best, >> Mick Jermsurawong >> >