Repository: carbondata
Updated Branches:
  refs/heads/master 414ea7730 -> d327cb2bd


[CARBONDATA-1252]Updated load section of configuration-parameters.md for 
BAD_RECORD_PATH

This closes #1207


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/d327cb2b
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/d327cb2b
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/d327cb2b

Branch: refs/heads/master
Commit: d327cb2bd56dd04cecc53988c8f88c4fd9cbe334
Parents: 414ea77
Author: vandana <[email protected]>
Authored: Fri Jul 28 15:43:26 2017 +0530
Committer: Jacky Li <[email protected]>
Committed: Thu Aug 3 00:19:31 2017 +0800

----------------------------------------------------------------------
 docs/configuration-parameters.md    |  5 +++-
 docs/dml-operation-on-carbondata.md | 39 +++++++++++++++++++++++++++++++-
 2 files changed, 42 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/carbondata/blob/d327cb2b/docs/configuration-parameters.md
----------------------------------------------------------------------
diff --git a/docs/configuration-parameters.md b/docs/configuration-parameters.md
index c85a522..133b75b 100644
--- a/docs/configuration-parameters.md
+++ b/docs/configuration-parameters.md
@@ -58,7 +58,10 @@ This section provides the details of all the configurations 
required for CarbonD
 | carbon.merge.sort.prefetch | true | Enable prefetch of data during merge 
sort while reading data from sort temp files in data loading. |  |
 | carbon.update.persist.enable | true | Enabling this parameter considers 
persistent data. Enabling this will reduce the execution time of UPDATE 
operation. |  |
 | carbon.load.global.sort.partitions | 0 | The Number of partitions to use 
when shuffling data for sort. If user don't configurate or configurate it less 
than 1, it uses the number of map tasks as reduce tasks. In general, we 
recommend 2-3 tasks per CPU core in your cluster.
-
+| carbon.options.bad.records.logger.enable | false | Whether to create logs 
with details about bad records. | |
+| carbon.bad.records.action | fail | This property can have four types of 
actions for bad records FORCE, REDIRECT, IGNORE and FAIL. If set to FORCE then 
it auto-corrects the data by storing the bad records as NULL. If set to 
REDIRECT then bad records are written to the raw CSV instead of being loaded. 
If set to IGNORE then bad records are neither loaded nor written to the raw 
CSV. If set to FAIL then data loading fails if any bad records are found. | |
+| carbon.options.is.empty.data.bad.record | false | If false, then empty ("" 
or '' or ,,) data will not be considered as bad record and vice versa. | |
+| carbon.options.bad.record.path |  | Specifies the HDFS path where bad 
records are stored. By default the value is Null. This path must to be 
configured by the user if bad record logger is enabled or bad record action 
redirect. | |
 
 
 * **Compaction Configuration**

http://git-wip-us.apache.org/repos/asf/carbondata/blob/d327cb2b/docs/dml-operation-on-carbondata.md
----------------------------------------------------------------------
diff --git a/docs/dml-operation-on-carbondata.md 
b/docs/dml-operation-on-carbondata.md
index e205972..c4c3465 100644
--- a/docs/dml-operation-on-carbondata.md
+++ b/docs/dml-operation-on-carbondata.md
@@ -149,7 +149,7 @@ You can use the following options to load data:
    
    * If this option is set to TRUE, then high.cardinality.identify.enable 
property will be disabled during data load.
    
-### Example:
+  ### Example:
 
 ```
 LOAD DATA local inpath '/opt/rawdata/data.csv' INTO table carbontable
@@ -164,6 +164,43 @@ options('DELIMITER'=',', 'QUOTECHAR'='"','COMMENTCHAR'='#',
 )
 ```
 
+- **BAD RECORDS HANDLING:** Methods of handling bad records are as follows:
+
+    * Load all of the data before dealing with the errors.
+
+    * Clean or delete bad records before loading data or stop the loading when 
bad records are found.
+
+    ```
+    OPTIONS('BAD_RECORDS_LOGGER_ENABLE'='true', 
'BAD_RECORD_PATH'='hdfs://hacluster/tmp/carbon', 
'BAD_RECORDS_ACTION'='REDIRECT', 'IS_EMPTY_DATA_BAD_RECORD'='false')
+    ```
+
+    NOTE:
+
+    * If the REDIRECT option is used, Carbon will add all bad records in to a 
separate CSV file. However, this file must not be used for subsequent data 
loading because the content may not exactly match the source record. You are 
advised to cleanse the original source record for further data ingestion. This 
option is used to remind you which records are bad records.
+
+    * In loaded data, if all records are bad records, the BAD_RECORDS_ACTION 
is invalid and the load operation fails.
+
+    * The maximum number of characters per column is 100000. If there are more 
than 100000 characters in a column, data loading will fail.
+
+### Example:
+
+```
+LOAD DATA INPATH 'filepath.csv'
+INTO TABLE tablename
+OPTIONS('BAD_RECORDS_LOGGER_ENABLE'='true',
+'BAD_RECORD_PATH'='hdfs://hacluster/tmp/carbon',
+'BAD_RECORDS_ACTION'='REDIRECT',
+'IS_EMPTY_DATA_BAD_RECORD'='false');
+```
+
+ **Bad Records Management Options:**
+
+ | Options                   | Default Value | Description                     
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
         |
+ 
|---------------------------|---------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+ | BAD_RECORDS_LOGGER_ENABLE | false         | Whether to create logs with 
details about bad records.                                                      
                                                                                
                                                                                
                                                                                
                                                                                
             |
+ | BAD_RECORDS_ACTION        | FAIL          | Following are the four types of 
action for bad records:  FORCE: Auto-corrects the data by storing the bad 
records as NULL.  REDIRECT: Bad records are written to the raw CSV instead of 
being loaded.  IGNORE: Bad records are neither loaded nor written to the raw 
CSV.  FAIL: Data loading fails if any bad records are found.  NOTE: In loaded 
data, if all records are bad records, the BAD_RECORDS_ACTION is invalid and the 
load operation fails. |
+ | IS_EMPTY_DATA_BAD_RECORD  | false         | If false, then empty ("" or '' 
or ,,) data will not be considered as bad record and vice versa.                
                                                                                
                                                                                
                                                                                
                                                                                
          |
+ | BAD_RECORD_PATH           | -             | Specifies the HDFS path where 
bad records are stored. By default the value is Null. This path must to be 
configured by the user if bad record logger is enabled or bad record action 
redirect.                                                                       
                                                                                
                                                                                
                    |
 
 ## INSERT DATA INTO A CARBONDATA TABLE
 

Reply via email to