[GitHub] [hudi] nsivabalan commented on a change in pull request #2245: [WIP] Adding Hudi indexing mechanisms blog

GitBox Sun, 15 Nov 2020 07:47:03 -0800


nsivabalan commented on a change in pull request #2245:
URL: https://github.com/apache/hudi/pull/2245#discussion_r523751401




##########
File path: docs/_posts/2020-11-11-hudi-indexing-mechanisms.mb
##########
@@ -0,0 +1,92 @@
+---
+title: "Apache Hudi Indexing mechanisms"
+excerpt: "Detailing different indexing mechanisms in Hudi and when to use each 
of them"
+author: sivabalan
+category: blog
+---
+
+
+## 1. Introduction
+Hoodie employs index to find and update the location of incoming records 
during write operations. Hoodie index is a very critical piece in Hoodie as it 
gives record level lookup support to Hudi for efficient write operations. This 
blog talks about different indices and when to use which one. 
+
+Hoodie dataset can be of two types in general, partitioned and 
non-partitioned. So, most index has two implementations one for partitioned 
dataset and another for non-partitioned called as global index. 
+
+These are the types of index supported by Hoodie as of now. 
+
+- InMemory
+- Bloom
+- Simple
+- Hbase 
+
+You could use “hoodie.index.type” to choose any of these indices. 
+
+### 1.1 Motivation
+Different workloads have different access patterns. Hudi supports different 
indexing schemes to cater to the needs of different workloads. So depending on 
one’s use-case, indexing schema can be chosen. 
+
+For eg: ……. 

Review comment:
       to be filled. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] nsivabalan commented on a change in pull request #2245: [WIP] Adding Hudi indexing mechanisms blog

Reply via email to