prasannarajaperumal commented on code in PR #6268:
URL: https://github.com/apache/hudi/pull/6268#discussion_r940912690


##########
website/src/pages/tech-specs.md:
##########
@@ -0,0 +1,371 @@
+# Apache Hudi Storage Format Specification [DRAFT]
+
+
+
+This document is a specification for the Hudi Storage Format which transforms 
immutable cloud/file storage systems into transactional data lakes. 
+
+## Overview
+
+Hudi Storage Format enables the following features over very large collection 
of files/objects
+
+- streaming primitives like incremental merges, change stream etc
+- database primitives like tables, transactions, mutability, indexes and query 
performance optimizations 
+
+Apache Hudi is an open source data lake platform that is built on top of the 
Hudi Storage Format and it unlocks the following features 
+
+- **Unified Computation model** - an unified way to combine large batch style 
operations and frequent near real time streaming operations over a single 
unified dataset
+- **Self-Optimized Storage** - Automatically handle all the table storage 
maintenance such as compaction, clustering, vacuuming asynchronously and 
non-blocking to actual data changes
+- **Cloud Native Database** - abstracts Table/Schema from actual storage and 
ensures up-to-date metadata and indexes unlocking multi-fold read and write 
performance optimizations
+- **Engine neutrality** - designed to be neutral and not having a preferred 
computation engine. Apache Hudi will manage metadata, provide common 
abstractions and pluggable interfaces to most/all common computational engines.
+
+
+
+## Storage Format
+
+### Layout Hierarchy
+
+At a high level, Hudi organizes data into a high level directory structure 
under the base path (root directory for the Hudi table). The directory 
structure is based on coarse-grained partitioning values set for the dataset. 
Non-partitioned data sets store all the data files under the base path. Hudi 
storage format has a special reserved *.hoodie* directory under the base path 
that is used to store transaction logs and metadata.
+
+```
+/data/hudi_trips/                                      <== BASE PATH

Review Comment:
   Hmm looks like tabs have different space lengths. Correcting it. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to