prasannarajaperumal commented on code in PR #6268: URL: https://github.com/apache/hudi/pull/6268#discussion_r937989954
########## website/src/pages/tech-specs.md: ########## @@ -0,0 +1,350 @@ +# Apache Hudi Storage Format Specification [DRAFT] + + + +This document is a specification for the Hudi Storage Format which transforms immutable cloud/file storage systems into transactional data lakes. + +## Overview + +Hudi Storage Format enables the following features over very large collection of files/objects + +- streaming primitives like incremental merges, change stream etc +- database primitives like tables, transactions, mutability, indexes and query performance optimizations + +Apache Hudi is an open source data lake platform that is built on top of the Hudi Storage Format and it unlocks the following features + +- **Unified Computation model** - an unified way to combine large batch style operations and frequent near real time streaming operations over a single unified dataset +- **Self Optimized Storage** - Automatically handle all the table storage maintenance such as compaction, clustering, vacuuming asynchronously and non-blocking to actual data changes +- **Cloud Native Database** - abstracts Table/Schema from actual storage and ensures up-to-date metadata and indexes unlocking multi-fold read and write performance optimizations +- **Data processing engine neutral** - designed to be neutral and not having a preferred computation engine. Apache Hudi will manage metadata, provide common abstractions and pluggable interfaces to most/all common computational engines. Review Comment: Made it as Engine neutral -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
