[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1260: [WIP] [HUDI-510] Update site documentation in sync with cWiki

GitBox Mon, 20 Jan 2020 18:17:35 -0800

vinothchandar commented on a change in pull request #1260: [WIP] [HUDI-510] 
Update site documentation in sync with cWiki
URL: https://github.com/apache/incubator-hudi/pull/1260#discussion_r368784018


 ##########
 File path: docs/_docs/2_1_concepts.md
 ##########
 @@ -53,69 +53,70 @@ With the help of the timeline, an incremental query 
attempting to get all new da
 only the changed files without say scanning all the time buckets > 07:00.
 
 ## File management
-Hudi organizes a datasets into a directory structure under a `basepath` on 
DFS. Dataset is broken up into partitions, which are folders containing data 
files for that partition,
+Hudi organizes a table into a directory structure under a `basepath` on DFS. 
Table is broken up into partitions, which are folders containing data files for 
that partition,
 very similar to Hive tables. Each partition is uniquely identified by its 
`partitionpath`, which is relative to the basepath.
 
 Within each partition, files are organized into `file groups`, uniquely 
identified by a `file id`. Each file group contains several
-`file slices`, where each slice contains a base columnar file (`*.parquet`) 
produced at a certain commit/compaction instant time,
+`file slices`, where each slice contains a base file (`*.parquet`) produced at 
a certain commit/compaction instant time,
  along with set of log files (`*.log.*`) that contain inserts/updates to the 
base file since the base file was produced. 
 Hudi adopts a MVCC design, where compaction action merges logs and base files 
to produce new file slices and cleaning action gets rid of 
 unused/older file slices to reclaim space on DFS. 
 
-Hudi provides efficient upserts, by mapping a given hoodie key (record key + 
partition path) consistently to a file group, via an indexing mechanism. 
+## Index
+Hudi provides efficient upserts, by mapping a given hoodie key (record key + 
partition path) consistently to a file id, via an indexing mechanism. 
 This mapping between record key and file group/file id, never changes once the 
first version of a record has been written to a file. In short, the 
 mapped file group contains all versions of a group of records.
 
-## Storage Types & Views
-Hudi storage types define how data is indexed & laid out on the DFS and how 
the above primitives and timeline activities are implemented on top of such 
organization (i.e how data is written). 
-In turn, `views` define how the underlying data is exposed to the queries (i.e 
how data is read). 
+## Table Types & Querying
 
 Review comment:
   and Queries (instead of Querying)? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1260: [WIP] [HUDI-510] Update site documentation in sync with cWiki

Reply via email to