[hudi] branch asf-site updated: [DOCS] Edit quickstart (#7120)

bhavanisudha Wed, 02 Nov 2022 12:04:43 -0700

This is an automated email from the ASF dual-hosted git repository.

bhavanisudha pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new e258bfe787 [DOCS] Edit quickstart (#7120)
e258bfe787 is described below

commit e258bfe7878a823b5c490b788f453c87a0f10649
Author: nfarah86 <nfara...@gmail.com>
AuthorDate: Wed Nov 2 12:04:25 2022 -0700

    [DOCS] Edit quickstart (#7120)
    
    * updated python time travel query to fix error
    
    * updated the tabs to see if it helps with the copying- there is an error 
on how the copy command is working
    
    * fixed python hard and soft deletes copying- there was weird behavior 
occuring.
    
    * made some stylistic changes
    
    * updated code overview to be under subsection
    
    * fixed the misalignment of text in hard and soft deletes
    
    Co-authored-by: nadine <nfarah@nadines-MacBook-Pro.local>
---
 website/docs/quick-start-guide.md | 45 ++++++++++++++++++++++++++-------------
 1 file changed, 30 insertions(+), 15 deletions(-)

diff --git a/website/docs/quick-start-guide.md 
b/website/docs/quick-start-guide.md
index c610964f6c..e00f6cdb25 100644
--- a/website/docs/quick-start-guide.md
+++ b/website/docs/quick-start-guide.md
@@ -635,7 +635,7 @@ spark.read. \
 
 spark.read. \
   format("hudi"). \
-  option("as.of.instant", "2021-07-28 14: 11: 08"). \
+  option("as.of.instant", "2021-07-28 14:11:08.000"). \
   load(basePath)
 
 # It is equal to "as.of.instant = 2021-07-28 00:00:00"
@@ -959,13 +959,15 @@ spark.sql("select `_hoodie_commit_time`, fare, begin_lon, 
begin_lat, ts from hud
 
 ## Delete data {#deletes}
 
-Apache Hudi supports two types of deletes: (1) **Soft Deletes**: retaining the 
record key and just nulling out the values
-for all the other fields (records with nulls in soft deletes are always 
persisted in storage and never removed);
-(2) **Hard Deletes**: physically removing any trace of the record from the 
table.  See the
-[deletion section](/docs/writing_data#deletes) of the writing data page for 
more details.
+Apache Hudi supports two types of deletes: <br/> 
+1.  **Soft Deletes**: This retains the record key and just nulls out the 
values for all the other fields. The records with nulls in soft deletes are 
always persisted in storage and never removed.
+2. **Hard Deletes**: This physically removes any trace of the record from the 
table. Check out the
+[deletion section](/docs/writing_data#deletes) for more details.
 
 ### Soft Deletes
 
+Soft deletes retain the record key and null out the values for all the other 
fields. For example, records with nulls in soft deletes are always persisted in 
storage and never removed.<br/><br/>
+
 <Tabs
 defaultValue="scala"
 values={[
@@ -1028,6 +1030,10 @@ Notice that the save mode is `Append`.
 </TabItem>
 <TabItem value="python">
 
+:::note
+Notice that the save mode is `Append`.
+:::
+
 ```python
 # pyspark
 from pyspark.sql.functions import lit
@@ -1036,9 +1042,11 @@ from functools import reduce
 spark.read.format("hudi"). \
   load(basePath). \
   createOrReplaceTempView("hudi_trips_snapshot")
+
 # fetch total records count
 spark.sql("select uuid, partitionpath from hudi_trips_snapshot").count()
 spark.sql("select uuid, partitionpath from hudi_trips_snapshot where rider is 
not null").count()
+
 # fetch two records for soft deletes
 soft_delete_ds = spark.sql("select * from hudi_trips_snapshot").limit(2)
 
@@ -1046,6 +1054,8 @@ soft_delete_ds = spark.sql("select * from 
hudi_trips_snapshot").limit(2)
 meta_columns = ["_hoodie_commit_time", "_hoodie_commit_seqno", 
"_hoodie_record_key", \
   "_hoodie_partition_path", "_hoodie_file_name"]
 excluded_columns = meta_columns + ["ts", "uuid", "partitionpath"]
+```
+```python
 nullify_columns = list(filter(lambda field: field[0] not in excluded_columns, \
   list(map(lambda field: (field.name, field.dataType), 
soft_delete_ds.schema.fields))))
 
@@ -1079,16 +1089,14 @@ spark.sql("select uuid, partitionpath from 
hudi_trips_snapshot").count()
 # This should return (total - 2) count as two records are updated with nulls
 spark.sql("select uuid, partitionpath from hudi_trips_snapshot where rider is 
not null").count()
 ```
-:::note
-Notice that the save mode is `Append`.
-:::
+
 </TabItem>
 
 </Tabs
 >
 
-
 ### Hard Deletes
+Hard deletes physically remove any trace of the record from the table. For 
example, this deletes records for the HoodieKeys passed in.<br/><br/>
 
 <Tabs
 defaultValue="scala"
@@ -1155,7 +1163,11 @@ delete from hudi_cow_pt_tbl where name = 'a1';
 
 </TabItem>
 <TabItem value="python">
-Delete records for the HoodieKeys passed in.<br/>
+
+
+:::note
+Only `Append` mode is supported for delete operation.
+:::
 
 ```python
 # pyspark
@@ -1188,19 +1200,22 @@ hard_delete_df.write.format("hudi"). \
 roAfterDeleteViewDF = spark. \
   read. \
   format("hudi"). \
-  load(basePath) 
-roAfterDeleteViewDF.createOrReplaceTempView("hudi_trips_snapshot")
+  load(basePath)
+```
+
+```python
+roAfterDeleteViewDF.createOrReplaceTempView("hudi_trips_snapshot") 
+
 # fetch should return (total - 2) records
 spark.sql("select uuid, partitionpath from hudi_trips_snapshot").count()
 ```
-:::note
-Only `Append` mode is supported for delete operation.
-:::
+
 </TabItem>
 
 </Tabs
 >
 
+
 ## Insert Overwrite
 
 Generate some new trips, overwrite the all the partitions that are present in 
the input. This operation can be faster

[hudi] branch asf-site updated: [DOCS] Edit quickstart (#7120)

Reply via email to