The PR here takes a stab at a more general solution: dynamically loaded
impl provided by user
https://github.com/apache/iceberg/pull/1531


On Wed, Sep 30, 2020 at 10:54 AM Mick Jermsurawong <
mickjermsuraw...@stripe.com> wrote:

> Hi thank you all for the discussion today!
>
> There are questions around whether this localization is *sufficient for
> most data localization requirements*.
> - The proposed solution here does not localize stats in the metadata, and
> in the PII columns some data will be exposed. It was suggested there are
> ways that we can turn off these stats for specific columns.
> - Whether centralized computation, as assumed in this proposal, is ever
> acceptable. If not, this proposal might not be of value to address data
> localization. (Internally, we believe it is sufficient for at least one use
> case we are working on)
>
> That brings us to *usefulness outside of data localization.*
> - One suggestion sees this data localization as a possible solution to
> lifecycle data management, partition value can suggest age of data to be
> written to different storage systems with cost and latency profiles.
>   - Others express that lifecycle policy management could be done from S3
> itself, or do a complete rewrite.
>
> There is helpful *feedback on implementations*
> - Whether we are leaking semantic meaning of "country"/"locality" into the
> location provider.
> - One suggestion is that this location provider can be customizable enough
> that we can leave these business logic here complete to users, instead of
> constraining it to simple string look-up as done in the proposed solution.
>
> I'm happy to take more input from folks. One line of useful discussion
> would be: if this is going to be off-the-box abstraction,
> - how customizable do we want it to be
> - what are use cases that this custom data location based on partitioning
> would be helpful--besides data localization and lifecycle management
>
> Also I'm also happy to discuss how folks are solving specific problems of
> data localization under different regulatory requirements.
>
> Best,
> Mick Jermsurawong
>
>
> On Mon, Sep 28, 2020 at 7:03 PM Mick Jermsurawong <
> mickjermsuraw...@stripe.com> wrote:
>
>> Hi Iceberg community,
>>
>> We are solving data localization following legal requirements to store
>> data in designated physical areas. We think that Iceberg can neatly solve
>> this problem with the existing interfaces. Here's the current proposal
>> <https://docs.google.com/document/d/1ZluOiRZlmsfNnQJLSqTiBQg7-XeSE-gvEOn2e0y6E54/edit#heading=h.cuifpdpzmfqz>
>>  explaining
>> our motivation and approach.
>>
>> We would appreciate input for the followings:
>> - if there are already similar on-going features from the community
>> - any non-iceberg approaches that have been considered to solve data
>> localization
>> - how much would this feature in our private fork be compatible with
>> future directions
>> - general feedback on the proposed solution
>>
>> Thank you in advance for any feedback here!
>>
>> Best,
>> Mick Jermsurawong
>>
>

Reply via email to