[I] [Python] A "personal data" boolean in metadata [arrow]

via GitHub Thu, 22 Jan 2026 23:08:59 -0800


simonaubertbd opened a new issue, #48959:
URL: https://github.com/apache/arrow/issues/48959


   ### Describe the enhancement requested
   
   Hello,
   
   As data increasingly moves across organizational and regulatory boundaries, 
data sensitivity is becoming just as important as data type. Today, teams often 
need to answer questions like:
   -Does this dataset contain personal data?
   -Which specific fields are subject to GDPR/CCPA or internal governance rules?
   -Can this column be safely logged, cached, or shared downstream?
   
   In practice, this information is either:
   Stored out-of-band (data catalogs, documentation), or
   Embedded in ad-hoc metadata conventions that vary by organization and tool.
   
   A simple, standardized personal_data boolean at the field level would 
provide a lightweight, interoperable signal that many tools could immediately 
benefit from.
   
   Field-level granularity is essential: most real datasets mix personal and 
non-personal columns.
   A boolean keeps the signal intentionally minimal.
   This would enable:
   Automatic detection and propagation of personal data flags across 
Arrow-compatible systems
   Safer defaults in query engines, serializers, and exporters
   Easier integration with data catalogs, lineage tools, and privacy audits
   Consistent behavior across Arrow consumers
   
   Crucially, this does not enforce semantics or compliance — it simply 
provides a common language.
   
   This would be entirely optional and backward-compatible.
   It does not preclude richer classifications in external systems.
   It aligns with Arrow’s existing use of key/value metadata without 
introducing new core types.
   
   Best regards,
   
   Simon
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [Python] A "personal data" boolean in metadata [arrow]

Reply via email to