[GitHub] [carbondata] akashrn5 commented on a change in pull request #3736: [CARBONDATA-3791]Correct the link, grammars and content of dml-management document

GitBox Mon, 04 May 2020 06:52:47 -0700


akashrn5 commented on a change in pull request #3736:
URL: https://github.com/apache/carbondata/pull/3736#discussion_r419451876




##########
File path: docs/dml-of-carbondata.md
##########
@@ -316,12 +311,12 @@ CarbonData DML statements are documented here,which 
includes:
   INSERT OVERWRITE TABLE table1 SELECT * FROM TABLE2
   ```
 
-### INSERT DATA INTO CARBONDATA TABLE From Stage Input Files
+## INSERT DATA INTO CARBONDATA TABLE From Stage Input Files
 
   Stage input files are data files written by external application (such as 
Flink). These files 
   are committed but not loaded into the table. 
   
-  You can use this command to insert them into the table, so that making them 
visible for query.
+  User can use this command to insert them into the table, so that making them 
visible for a query.

Review comment:
       done

##########
File path: docs/dml-of-carbondata.md
##########
@@ -219,61 +218,57 @@ CarbonData DML statements are documented here,which 
includes:
     OPTIONS('BAD_RECORDS_LOGGER_ENABLE'='true', 
'BAD_RECORD_PATH'='hdfs://hacluster/tmp/carbon', 
'BAD_RECORDS_ACTION'='REDIRECT', 'IS_EMPTY_DATA_BAD_RECORD'='false')
     ```
 
-  **NOTE:**
-  * BAD_RECORDS_ACTION property can have four type of actions for bad records 
FORCE, REDIRECT, IGNORE and FAIL.
-  * FAIL option is its Default value. If the FAIL option is used, then data 
loading fails if any bad records are found.
-  * If the REDIRECT option is used, CarbonData will add all bad records in to 
a separate CSV file. However, this file must not be used for subsequent data 
loading because the content may not exactly match the source record. You are 
advised to cleanse the original source record for further data ingestion. This 
option is used to remind you which records are bad records.
-  * If the FORCE option is used, then it auto-converts the data by storing the 
bad records as NULL before Loading data.
-  * If the IGNORE option is used, then bad records are neither loaded nor 
written to the separate CSV file.
-  * In loaded data, if all records are bad records, the BAD_RECORDS_ACTION is 
invalid and the load operation fails.
-  * The default maximum number of characters per column is 32000. If there are 
more than 32000 characters in a column, please refer to *String longer than 
32000 characters* section.
-  * Since Bad Records Path can be specified in create, load and carbon 
properties. 
-    Therefore, value specified in load will have the highest priority, and 
value specified in carbon properties will have the least priority.
+    **NOTE:**
+    * BAD_RECORDS_ACTION property can have four types of actions for bad 
records FORCE, REDIRECT, IGNORE, and FAIL.
+    * FAIL option is its Default value. If the FAIL option is used, then data 
loading fails if any bad records are found.
+    * If the REDIRECT option is used, CarbonData will add all bad records into 
a separate CSV file. However, this file must not be used for subsequent data 
loading because the content may not exactly match the source record. You are 
advised to cleanse the source record for further data ingestion. This option is 
used to remind you which records are bad.
+    * If the FORCE option is used, then it auto-converts the data by storing 
the bad records as NULL before Loading data.
+    * If the IGNORE option is used, then bad records are neither loaded nor 
written to the separate CSV file.
+    * In loaded data, if all records are bad records, the BAD_RECORDS_ACTION 
is invalid and the load operation fails.
+    * The default maximum number of characters per column is 32000. If there 
are more than 32000 characters in a column, please refer to *String longer than 
32000 characters* section.
+    * Since Bad Records Path can be specified in create, load and carbon 
properties. 
+      Therefore, the value specified in load will have the highest priority, 
and value specified in carbon properties will have the least priority.
 
-  Example:
+    Example:
 
-  ```
-  LOAD DATA INPATH 'filepath.csv' INTO TABLE tablename
-  
OPTIONS('BAD_RECORDS_LOGGER_ENABLE'='true','BAD_RECORD_PATH'='hdfs://hacluster/tmp/carbon',
-  'BAD_RECORDS_ACTION'='REDIRECT','IS_EMPTY_DATA_BAD_RECORD'='false')
-  ```
+    ```
+    LOAD DATA INPATH 'filepath.csv' INTO TABLE tablename
+    
OPTIONS('BAD_RECORDS_LOGGER_ENABLE'='true','BAD_RECORD_PATH'='hdfs://hacluster/tmp/carbon',
+    'BAD_RECORDS_ACTION'='REDIRECT','IS_EMPTY_DATA_BAD_RECORD'='false')
+    ```
 
   - ##### GLOBAL_SORT_PARTITIONS:
 
-    If the SORT_SCOPE is defined as GLOBAL_SORT, then user can specify the 
number of partitions to use while shuffling data for sort using 
GLOBAL_SORT_PARTITIONS. If it is not configured, or configured less than 1, 
then it uses the number of map task as reduce task. It is recommended that each 
reduce task deal with 512MB-1GB data.
+    If the SORT_SCOPE is defined as GLOBAL_SORT, then the user can specify the 
number of partitions to use while shuffling data for sort using 
GLOBAL_SORT_PARTITIONS. If it is not configured, or configured less than 1, 
then it uses the number of map tasks as reduce tasks. It is recommended that 
each reduce task to deal with 512MB-1GB data.

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3736: [CARBONDATA-3791]Correct the link, grammars and content of dml-management document

Reply via email to