[incubator-hudi] branch asf-site updated: [HUDI-403] Adds guidelines on deployment/upgrading

vbalaji Mon, 20 Jan 2020 21:08:04 -0800

This is an automated email from the ASF dual-hosted git repository.

vbalaji pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new 39db1ae  [HUDI-403] Adds guidelines on deployment/upgrading
39db1ae is described below

commit 39db1aedbb9cb4533c58f869d5a940fbc1a3e5d2
Author: vinothchandar <[email protected]>
AuthorDate: Mon Jan 20 18:01:19 2020 -0800

    [HUDI-403] Adds guidelines on deployment/upgrading
    
     - Moved "Adminsitering" page to "Deployment"
     - Still need to add information on deltastreamer modes/compaction
---
 docs/_data/navigation.yml                          |   6 +-
 docs/_docs/2_2_writing_data.md                     |   2 +-
 ...{2_6_admin_guide.cn.md => 2_6_deployment.cn.md} |   2 +-
 .../{2_6_admin_guide.md => 2_6_deployment.md}      | 173 ++++++++++++---------
 4 files changed, 107 insertions(+), 76 deletions(-)

diff --git a/docs/_data/navigation.yml b/docs/_data/navigation.yml
index d2826dd..a3510f4 100644
--- a/docs/_data/navigation.yml
+++ b/docs/_data/navigation.yml
@@ -38,8 +38,8 @@ docs:
         url: /docs/configurations.html
       - title: "Performance"
         url: /docs/performance.html
-      - title: "Administering"
-        url: /docs/admin_guide.html
+      - title: "Deployment"
+        url: /docs/deployment.html
   - title: INFO
     children:
       - title: "Docs Versions"
@@ -86,7 +86,7 @@ cn_docs:
       - title: "性能"
         url: /cn/docs/performance.html
       - title: "管理"
-        url: /cn/docs/admin_guide.html
+        url: /cn/docs/deployment.html
   - title: 其他信息
     children:
       - title: "文档版本"
diff --git a/docs/_docs/2_2_writing_data.md b/docs/_docs/2_2_writing_data.md
index c0f5184..832daa6 100644
--- a/docs/_docs/2_2_writing_data.md
+++ b/docs/_docs/2_2_writing_data.md
@@ -8,7 +8,7 @@ last_modified_at: 2019-12-30T15:59:57-04:00
 ---
 
 In this section, we will cover ways to ingest new changes from external 
sources or even other Hudi datasets using the [DeltaStreamer](#deltastreamer) 
tool, as well as 
-speeding up large Spark jobs via upserts using the [Hudi 
datasource](#datasource-writer). Such datasets can then be 
[queried](querying_data.html) using various query engines.
+speeding up large Spark jobs via upserts using the [Hudi 
datasource](#datasource-writer). Such datasets can then be 
[queried](/docs/querying_data.html) using various query engines.
 
 
 ## Write Operations
diff --git a/docs/_docs/2_6_admin_guide.cn.md b/docs/_docs/2_6_deployment.cn.md
similarity index 99%
rename from docs/_docs/2_6_admin_guide.cn.md
rename to docs/_docs/2_6_deployment.cn.md
index ce055f0..c555b54 100644
--- a/docs/_docs/2_6_admin_guide.cn.md
+++ b/docs/_docs/2_6_deployment.cn.md
@@ -1,7 +1,7 @@
 ---
 title: 管理 Hudi Pipelines
 keywords: hudi, administration, operation, devops
-permalink: /cn/docs/admin_guide.html
+permalink: /cn/docs/deployment.html
 summary: This section offers an overview of tools available to operate an 
ecosystem of Hudi datasets
 toc: true
 last_modified_at: 2019-12-30T15:59:57-04:00
diff --git a/docs/_docs/2_6_admin_guide.md b/docs/_docs/2_6_deployment.md
similarity index 73%
rename from docs/_docs/2_6_admin_guide.md
rename to docs/_docs/2_6_deployment.md
index 6990f50..295f8e8 100644
--- a/docs/_docs/2_6_admin_guide.md
+++ b/docs/_docs/2_6_deployment.md
@@ -1,51 +1,87 @@
 ---
-title: Administering Hudi Pipelines
-keywords: hudi, administration, operation, devops
-permalink: /docs/admin_guide.html
-summary: This section offers an overview of tools available to operate an 
ecosystem of Hudi datasets
+title: Deployment Guide
+keywords: hudi, administration, operation, devops, deployment
+permalink: /docs/deployment.html
+summary: This section offers an overview of tools available to operate an 
ecosystem of Hudi
 toc: true
 last_modified_at: 2019-12-30T15:59:57-04:00
 ---
 
-Admins/ops can gain visibility into Hudi datasets/pipelines in the following 
ways
+This section provides all the help you need to deploy and operate Hudi tables 
at scale. 
+Specifically, we will cover the following aspects.
 
- - [Administering via the Admin CLI](#admin-cli)
- - [Graphite metrics](#metrics)
- - [Spark UI of the Hudi Application](#spark-ui)
+ - [Deployment Model](#deploying) : How various Hudi components are deployed 
and managed.
+ - [Upgrading Versions](#upgrading) : Picking up new releases of Hudi, 
guidelines and general best-practices
+ - [Migrating to Hudi](#migrating) : How to migrate your existing tables to 
Apache Hudi.
+ - [Interacting via CLI](#cli) : Using the CLI to perform maintenance or 
deeper introspection
+ - [Monitoring](#monitoring) : Tracking metrics from your hudi tables using 
popular tools.
+ - [Troubleshooting](#troubleshooting) : Uncovering, triaging and resolving 
issues in production.
+ 
+## Deploying
 
-This section provides a glimpse into each of these, with some general guidance 
on [troubleshooting](#troubleshooting)
+All in all, Hudi deploys with no long running servers or additional 
infrastructure cost to your data lake. In fact, Hudi pioneered this model of 
building a transactional distributed storage layer
+using existing infrastructure and its heartening to see other systems adopting 
similar approaches as well. Hudi writing is done via Spark jobs (DeltaStreamer 
or custom Spark datasource jobs), deployed per standard Apache Spark 
[recommendations](https://spark.apache.org/docs/latest/cluster-overview.html).
+Querying Hudi tables happens via libraries installed into Apache Hive, Apache 
Spark or Presto and hence no additional infrastructure is necessary. 
 
-## Admin CLI
 
-Once hudi has been built, the shell can be fired by via  `cd hudi-cli && 
./hudi-cli.sh`.
-A hudi dataset resides on DFS, in a location referred to as the **basePath** 
and we would need this location in order to connect to a Hudi dataset.
-Hudi library effectively manages this dataset internally, using .hoodie 
subfolder to track all metadata
+## Upgrading 
+
+New Hudi releases are listed on the [releases page](/releases), with detailed 
notes which list all the changes, with highlights in each release. 
+At the end of the day, Hudi is a storage system and with that comes a lot of 
responsibilities, which we take seriously. 
+
+As general guidelines, 
+
+ - We strive to keep all changes backwards compatible (i.e new code can read 
old data/timeline files) and we cannot we will provide upgrade/downgrade tools 
via the CLI
+ - We cannot always guarantee forward compatibility (i.e old code being able 
to read data/timeline files written by a greater version). This is generally 
the norm, since no new features can be built otherwise.
+   However any large such changes, will be turned off by default, for smooth 
transition to newer release. After a few releases and once enough users deem 
the feature stable in production, we will flip the defaults in a subsequent 
release.
+ - Always upgrade the query bundles (mr-bundle, presto-bundle, spark-bundle) 
first and then upgrade the writers (deltastreamer, spark jobs using 
datasource). This often provides the best experience and it's easy to fix 
+   any issues by rolling forward/back the writer code (which typically you 
might have more control over)
+ - With large, feature rich releases we recommend migrating slowly, by first 
testing in staging environments and running your own tests. Upgrading Hudi is 
no different than upgrading any database system.
+
+Note that release notes can override this information with specific 
instructions, applicable on case-by-case basis.
+
+## Migrating
+
+Currently migrating to Hudi can be done using two approaches 
+
+- **Convert newer partitions to Hudi** : This model is suitable for large 
event tables (e.g: click streams, ad impressions), which also typically receive 
writes for the last few days alone. You can convert the last 
+   N partitions to Hudi and proceed writing as if it were a Hudi table to 
begin with. The Hudi query side code is able to correctly handle both hudi and 
non-hudi data partitions.
+- **Full conversion to Hudi** : This model is suitable if you are currently 
bulk/full loading the table few times a day (e.g database ingestion). The full 
conversion of Hudi is simply a one-time step (akin to 1 run of your existing 
job),
+   which moves all of the data into the Hudi format and provides the ability 
to incrementally update for future writes.
+
+For more details, refer to the detailed [migration 
guide](/docs/migration_guide.html). In the future, we will be supporting 
seamless zero-copy bootstrap of existing tables with all the upsert/incremental 
query capabilities fully supported.
+
+## CLI
+
+Once hudi has been built, the shell can be fired by via  `cd hudi-cli && 
./hudi-cli.sh`. A hudi table resides on DFS, in a location referred to as the 
`basePath` and 
+we would need this location in order to connect to a Hudi table. Hudi library 
effectively manages this table internally, using `.hoodie` subfolder to track 
all metadata
 
 To initialize a hudi table, use the following command.
 
 ```java
-18/09/06 15:56:52 INFO annotation.AutowiredAnnotationBeanPostProcessor: 
JSR-330 'javax.inject.Inject' annotation found and supported for autowiring
-============================================
-*                                          *
-*     _    _           _   _               *
-*    | |  | |         | | (_)              *
-*    | |__| |       __| |  -               *
-*    |  __  ||   | / _` | ||               *
-*    | |  | ||   || (_| | ||               *
-*    |_|  |_|\___/ \____/ ||               *
-*                                          *
-============================================
-
-Welcome to Hoodie CLI. Please type help if you are looking for help.
+===================================================================
+*         ___                          ___                        *
+*        /\__\          ___           /\  \           ___         *
+*       / /  /         /\__\         /  \  \         /\  \        *
+*      / /__/         / /  /        / /\ \  \        \ \  \       *
+*     /  \  \ ___    / /  /        / /  \ \__\       /  \__\      *
+*    / /\ \  /\__\  / /__/  ___   / /__/ \ |__|     / /\/__/      *
+*    \/  \ \/ /  /  \ \  \ /\__\  \ \  \ / /  /  /\/ /  /         *
+*         \  /  /    \ \  / /  /   \ \  / /  /   \  /__/          *
+*         / /  /      \ \/ /  /     \ \/ /  /     \ \__\          *
+*        / /  /        \  /  /       \  /  /       \/__/          *
+*        \/__/          \/__/         \/__/    Apache Hudi CLI    *
+*                                                                 *
+===================================================================
+
 hudi->create --path /user/hive/warehouse/table1 --tableName hoodie_table_1 
--tableType COPY_ON_WRITE
 .....
-18/09/06 15:57:15 INFO table.HoodieTableMetaClient: Finished Loading Table of 
type COPY_ON_WRITE from ...
 ```
 
 To see the description of hudi table, use the command:
 
 ```java
-hoodie:hoodie_table_1->desc
+hudi:hoodie_table_1->desc
 18/09/06 15:57:19 INFO timeline.HoodieActiveTimeline: Loaded instants []
     _________________________________________________________
     | Property                | Value                        |
@@ -58,37 +94,34 @@ hoodie:hoodie_table_1->desc
     | hoodie.archivelog.folder|                              |
 ```
 
-Following is a sample command to connect to a Hudi dataset contains uber trips.
+Following is a sample command to connect to a Hudi table contains uber trips.
 
 ```java
-hoodie:trips->connect --path /app/uber/trips
+hudi:trips->connect --path /app/uber/trips
 
-16/10/05 23:20:37 INFO model.HoodieTableMetadata: Attempting to load the 
commits under /app/uber/trips/.hoodie with suffix .commit
-16/10/05 23:20:37 INFO model.HoodieTableMetadata: Attempting to load the 
commits under /app/uber/trips/.hoodie with suffix .inflight
 16/10/05 23:20:37 INFO model.HoodieTableMetadata: All commits 
:HoodieCommits{commitList=[20161002045850, 20161002052915, 20161002055918, 
20161002065317, 20161002075932, 20161002082904, 20161002085949, 20161002092936, 
20161002105903, 20161002112938, 20161002123005, 20161002133002, 20161002155940, 
20161002165924, 20161002172907, 20161002175905, 20161002190016, 20161002192954, 
20161002195925, 20161002205935, 20161002215928, 20161002222938, 20161002225915, 
20161002232906, 20161003003028, 201 [...]
 Metadata for table trips loaded
-hoodie:trips->
 ```
 
-Once connected to the dataset, a lot of other commands become available. The 
shell has contextual autocomplete help (press TAB) and below is a list of all 
commands, few of which are reviewed in this section
+Once connected to the table, a lot of other commands become available. The 
shell has contextual autocomplete help (press TAB) and below is a list of all 
commands, few of which are reviewed in this section
 are reviewed
 
 ```java
-hoodie:trips->help
+hudi:trips->help
 * ! - Allows execution of operating system (OS) commands
 * // - Inline comment markers (start of line only)
 * ; - Inline comment markers (start of line only)
-* addpartitionmeta - Add partition metadata to a dataset, if not present
+* addpartitionmeta - Add partition metadata to a table, if not present
 * clear - Clears the console
 * cls - Clears the console
 * commit rollback - Rollback a commit
-* commits compare - Compare commits with another Hoodie dataset
+* commits compare - Compare commits with another Hoodie table
 * commit showfiles - Show file level details of a commit
 * commit showpartitions - Show partition level details of a commit
 * commits refresh - Refresh the commits
 * commits show - Show the commits
-* commits sync - Compare commits with another Hoodie dataset
-* connect - Connect to a hoodie dataset
+* commits sync - Compare commits with another Hoodie table
+* connect - Connect to a hoodie table
 * date - Displays the local date and time
 * exit - Exits the shell
 * help - List all commands usage
@@ -102,27 +135,26 @@ hoodie:trips->help
 * utils loadClass - Load a class
 * version - Displays shell version
 
-hoodie:trips->
+hudi:trips->
 ```
 
 
 ### Inspecting Commits
 
-The task of upserting or inserting a batch of incoming records is known as a 
**commit** in Hudi. A commit provides basic atomicity guarantees such that only 
commited data is available for querying.
+The task of upserting or inserting a batch of incoming records is known as a 
**commit** in Hudi. A commit provides basic atomicity guarantees such that only 
committed data is available for querying.
 Each commit has a monotonically increasing string/number called the **commit 
number**. Typically, this is the time at which we started the commit.
 
 To view some basic information about the last 10 commits,
 
 
 ```java
-hoodie:trips->commits show --sortBy "Total Bytes Written" --desc true --limit 
10
+hudi:trips->commits show --sortBy "Total Bytes Written" --desc true --limit 10
     
________________________________________________________________________________________________________________________________________________________________________
     | CommitTime    | Total Bytes Written| Total Files Added| Total Files 
Updated| Total Partitions Written| Total Records Written| Total Update Records 
Written| Total Errors|
     
|=======================================================================================================================================================================|
     ....
     ....
     ....
-hoodie:trips->
 ```
 
 At the start of each write, Hudi also writes a .inflight commit to the .hoodie 
folder. You can use the timestamp there to estimate how long the commit has 
been inflight
@@ -140,7 +172,7 @@ To understand how the writes spread across specific 
partiions,
 
 
 ```java
-hoodie:trips->commit showpartitions --commit 20161005165855 --sortBy "Total 
Bytes Written" --desc true --limit 10
+hudi:trips->commit showpartitions --commit 20161005165855 --sortBy "Total 
Bytes Written" --desc true --limit 10
     
__________________________________________________________________________________________________________________________________________
     | Partition Path| Total Files Added| Total Files Updated| Total Records 
Inserted| Total Records Updated| Total Bytes Written| Total Errors|
     
|=========================================================================================================================================|
@@ -152,7 +184,7 @@ If you need file level granularity , we can do the following
 
 
 ```java
-hoodie:trips->commit showfiles --commit 20161005165855 --sortBy "Partition 
Path"
+hudi:trips->commit showfiles --commit 20161005165855 --sortBy "Partition Path"
     
________________________________________________________________________________________________________________________________________________________
     | Partition Path| File ID                             | Previous Commit| 
Total Records Updated| Total Records Written| Total Bytes Written| Total Errors|
     
|=======================================================================================================================================================|
@@ -163,11 +195,11 @@ hoodie:trips->commit showfiles --commit 20161005165855 
--sortBy "Partition Path"
 
 ### FileSystem View
 
-Hudi views each partition as a collection of file-groups with each file-group 
containing a list of file-slices in commit
-order (See Concepts). The below commands allow users to view the file-slices 
for a data-set.
+Hudi views each partition as a collection of file-groups with each file-group 
containing a list of file-slices in commit order (See [concepts]()). 
+The below commands allow users to view the file-slices for a data-set.
 
 ```java
- hoodie:stock_ticks_mor->show fsview all
+hudi:stock_ticks_mor->show fsview all
  ....
   
_______________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
  | Partition | FileId | Base-Instant | Data-File | Data-File Size| Num Delta 
Files| Total Delta File Size| Delta Files |
@@ -176,24 +208,23 @@ order (See Concepts). The below commands allow users to 
view the file-slices for
 
 
 
- hoodie:stock_ticks_mor->show fsview latest --partitionPath "2018/08/31"
+hudi:stock_ticks_mor->show fsview latest --partitionPath "2018/08/31"
  ......
  
___________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
 [...]
  | Partition | FileId | Base-Instant | Data-File | Data-File Size| Num Delta 
Files| Total Delta Size| Delta Size - compaction scheduled| Delta Size - 
compaction unscheduled| Delta To Base Ratio - compaction scheduled| Delta To 
Base Ratio - compaction unscheduled| Delta Files - compaction scheduled | Delta 
Files - compaction unscheduled|
  
|==========================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================
 [...]
  | 2018/08/31| 111415c3-f26d-4639-86c8-f9956f245ac3| 20181002180759| 
hdfs://namenode:8020/user/hive/warehouse/stock_ticks_mor/2018/08/31/111415c3-f26d-4639-86c8-f9956f245ac3_0_20181002180759.parquet|
 432.5 KB | 1 | 20.8 KB | 20.8 KB | 0.0 B | 0.0 B | 0.0 B | [HoodieLogFile 
{hdfs://namenode:8020/user/hive/warehouse/stock_ticks_mor/2018/08/31/.111415c3-f26d-4639-86c8-f9956f245ac3_20181002180759.log.1}]|
 [] |
 
- hoodie:stock_ticks_mor->
 ```
 
 
 ### Statistics
 
-Since Hudi directly manages file sizes for DFS dataset, it might be good to 
get an overall picture
+Since Hudi directly manages file sizes for DFS table, it might be good to get 
an overall picture
 
 
 ```java
-hoodie:trips->stats filesizes --partitionPath 2016/09/01 --sortBy "95th" 
--desc true --limit 10
+hudi:trips->stats filesizes --partitionPath 2016/09/01 --sortBy "95th" --desc 
true --limit 10
     
________________________________________________________________________________________________
     | CommitTime    | Min     | 10th    | 50th    | avg     | 95th    | Max    
 | NumFiles| StdDev  |
     
|===============================================================================================|
@@ -206,7 +237,7 @@ In case of Hudi write taking much longer, it might be good 
to see the write ampl
 
 
 ```java
-hoodie:trips->stats wa
+hudi:trips->stats wa
     __________________________________________________________________________
     | CommitTime    | Total Upserted| Total Written| Write Amplifiation Factor|
     |=========================================================================|
@@ -227,7 +258,7 @@ To get an idea of the lag between compaction and writer 
applications, use the be
 pending compactions.
 
 ```java
-hoodie:trips->compactions show all
+hudi:trips->compactions show all
      ___________________________________________________________________
     | Compaction Instant Time| State    | Total FileIds to be Compacted|
     |==================================================================|
@@ -238,7 +269,7 @@ hoodie:trips->compactions show all
 To inspect a specific compaction plan, use
 
 ```java
-hoodie:trips->compaction show --instant <INSTANT_1>
+hudi:trips->compaction show --instant <INSTANT_1>
     
_________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
     | Partition Path| File Id | Base Instant  | Data File Path                 
                   | Total Delta Files| getMetrics                              
                                                                                
      |
     
|================================================================================================================================================================================================================================================
@@ -249,11 +280,11 @@ hoodie:trips->compaction show --instant <INSTANT_1>
 To manually schedule or run a compaction, use the below command. This command 
uses spark launcher to perform compaction
 operations. 
 
-**NOTE:** Make sure no other application is scheduling compaction for this 
dataset concurrently
+**NOTE:** Make sure no other application is scheduling compaction for this 
table concurrently
 {: .notice--info}
 
 ```java
-hoodie:trips->help compaction schedule
+hudi:trips->help compaction schedule
 Keyword:                   compaction schedule
 Description:               Schedule Compaction
  Keyword:                  sparkMemory
@@ -266,7 +297,7 @@ Description:               Schedule Compaction
 ```
 
 ```java
-hoodie:trips->help compaction run
+hudi:trips->help compaction run
 Keyword:                   compaction run
 Description:               Run Compaction for given instant time
  Keyword:                  tableName
@@ -300,7 +331,7 @@ Description:               Run Compaction for given instant 
time
    Default if unspecified: '__NULL__'
 
  Keyword:                  compactionInstant
-   Help:                   Base path for the target hoodie dataset
+   Help:                   Base path for the target hoodie table
    Mandatory:              true
    Default if specified:   '__NULL__'
    Default if unspecified: '__NULL__'
@@ -313,7 +344,7 @@ Description:               Run Compaction for given instant 
time
 Validating a compaction plan : Check if all the files necessary for 
compactions are present and are valid
 
 ```java
-hoodie:stock_ticks_mor->compaction validate --instant 20181005222611
+hudi:stock_ticks_mor->compaction validate --instant 20181005222611
 ...
 
    COMPACTION PLAN VALID
@@ -325,7 +356,7 @@ hoodie:stock_ticks_mor->compaction validate --instant 
20181005222611
 
 
 
-hoodie:stock_ticks_mor->compaction validate --instant 20181005222601
+hudi:stock_ticks_mor->compaction validate --instant 20181005222601
 
    COMPACTION PLAN INVALID
 
@@ -343,10 +374,10 @@ operation. Any new log-files that happened on this file 
after the compaction got
 so that are preserved. Hudi provides the following CLI to support it
 
 
-### UnScheduling Compaction
+### Unscheduling Compaction
 
 ```java
-hoodie:trips->compaction unscheduleFileId --fileId <FileUUID>
+hudi:trips->compaction unscheduleFileId --fileId <FileUUID>
 ....
 No File renames needed to unschedule file from pending compaction. Operation 
successful.
 ```
@@ -354,7 +385,7 @@ No File renames needed to unschedule file from pending 
compaction. Operation suc
 In other cases, an entire compaction plan needs to be reverted. This is 
supported by the following CLI
 
 ```java
-hoodie:trips->compaction unschedule --compactionInstant <compactionInstant>
+hudi:trips->compaction unschedule --compactionInstant <compactionInstant>
 .....
 No File renames needed to unschedule pending compaction. Operation successful.
 ```
@@ -368,16 +399,16 @@ command comes to the rescue, it will rearrange the 
file-slices so that there is
 consistent with the compaction plan
 
 ```java
-hoodie:stock_ticks_mor->compaction repair --instant 20181005222611
+hudi:stock_ticks_mor->compaction repair --instant 20181005222611
 ......
 Compaction successfully repaired
 .....
 ```
 
 
-## Metrics {#metrics}
+## Monitoring
 
-Once the Hudi Client is configured with the right datasetname and environment 
for metrics, it produces the following graphite metrics, that aid in debugging 
hudi datasets
+Once the Hudi writer is configured with the right table and environment for 
metrics, it produces the following graphite metrics, that aid in debugging hudi 
tables
 
  - **Commit Duration** - This is amount of time it took to successfully commit 
a batch of records
  - **Rollback Duration** - Similarly, amount of time taken to undo partial 
data left over by a failed commit (happens everytime automatically after a 
failing write)
@@ -392,7 +423,7 @@ These metrics can then be plotted on a standard tool like 
grafana. Below is a sa
 </figure>
 
 
-## Troubleshooting Failures {#troubleshooting}
+## Troubleshooting
 
 Section below generally aids in debugging Hudi failures. Off the bat, the 
following metadata is added to every record to help triage  issues easily using 
standard Hadoop SQL engines (Hive/Presto/Spark)
 
@@ -400,9 +431,9 @@ Section below generally aids in debugging Hudi failures. 
Off the bat, the follow
  - **_hoodie_commit_time** - Last commit that touched this record
  - **_hoodie_file_name** - Actual file name containing the record (super 
useful to triage duplicates)
  - **_hoodie_partition_path** - Path from basePath that identifies the 
partition containing this record
+ 
+ For performance related issues, please refer to the [tuning 
guide](https://cwiki.apache.org/confluence/display/HUDI/Tuning+Guide)
 
-**NOTE:** As of now, Hudi assumes the application passes in the same 
deterministic partitionpath for a given recordKey. i.e the uniqueness of record 
key is only enforced within each partition.
-{: .notice--info}
 
 ### Missing records
 
@@ -411,7 +442,7 @@ If you do find errors, then the record was not actually 
written by Hudi, but han
 
 ### Duplicates
 
-First of all, please confirm if you do indeed have duplicates **AFTER** 
ensuring the query is accessing the Hudi datasets [properly](sql_queries.html) .
+First of all, please confirm if you do indeed have duplicates **AFTER** 
ensuring the query is accessing the Hudi table 
[properly](/docs/sql_queries.html) .
 
  - If confirmed, please use the metadata fields above, to identify the 
physical files & partition files containing the records .
  - If duplicates span files across partitionpath, then this means your 
application is generating different partitionPaths for same recordKey, Please 
fix your app

[incubator-hudi] branch asf-site updated: [HUDI-403] Adds guidelines on deployment/upgrading

Reply via email to