Repository: carbondata Updated Branches: refs/heads/master 414ea7730 -> d327cb2bd
[CARBONDATA-1252]Updated load section of configuration-parameters.md for BAD_RECORD_PATH This closes #1207 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/d327cb2b Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/d327cb2b Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/d327cb2b Branch: refs/heads/master Commit: d327cb2bd56dd04cecc53988c8f88c4fd9cbe334 Parents: 414ea77 Author: vandana <[email protected]> Authored: Fri Jul 28 15:43:26 2017 +0530 Committer: Jacky Li <[email protected]> Committed: Thu Aug 3 00:19:31 2017 +0800 ---------------------------------------------------------------------- docs/configuration-parameters.md | 5 +++- docs/dml-operation-on-carbondata.md | 39 +++++++++++++++++++++++++++++++- 2 files changed, 42 insertions(+), 2 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/carbondata/blob/d327cb2b/docs/configuration-parameters.md ---------------------------------------------------------------------- diff --git a/docs/configuration-parameters.md b/docs/configuration-parameters.md index c85a522..133b75b 100644 --- a/docs/configuration-parameters.md +++ b/docs/configuration-parameters.md @@ -58,7 +58,10 @@ This section provides the details of all the configurations required for CarbonD | carbon.merge.sort.prefetch | true | Enable prefetch of data during merge sort while reading data from sort temp files in data loading. | | | carbon.update.persist.enable | true | Enabling this parameter considers persistent data. Enabling this will reduce the execution time of UPDATE operation. | | | carbon.load.global.sort.partitions | 0 | The Number of partitions to use when shuffling data for sort. If user don't configurate or configurate it less than 1, it uses the number of map tasks as reduce tasks. In general, we recommend 2-3 tasks per CPU core in your cluster. - +| carbon.options.bad.records.logger.enable | false | Whether to create logs with details about bad records. | | +| carbon.bad.records.action | fail | This property can have four types of actions for bad records FORCE, REDIRECT, IGNORE and FAIL. If set to FORCE then it auto-corrects the data by storing the bad records as NULL. If set to REDIRECT then bad records are written to the raw CSV instead of being loaded. If set to IGNORE then bad records are neither loaded nor written to the raw CSV. If set to FAIL then data loading fails if any bad records are found. | | +| carbon.options.is.empty.data.bad.record | false | If false, then empty ("" or '' or ,,) data will not be considered as bad record and vice versa. | | +| carbon.options.bad.record.path | | Specifies the HDFS path where bad records are stored. By default the value is Null. This path must to be configured by the user if bad record logger is enabled or bad record action redirect. | | * **Compaction Configuration** http://git-wip-us.apache.org/repos/asf/carbondata/blob/d327cb2b/docs/dml-operation-on-carbondata.md ---------------------------------------------------------------------- diff --git a/docs/dml-operation-on-carbondata.md b/docs/dml-operation-on-carbondata.md index e205972..c4c3465 100644 --- a/docs/dml-operation-on-carbondata.md +++ b/docs/dml-operation-on-carbondata.md @@ -149,7 +149,7 @@ You can use the following options to load data: * If this option is set to TRUE, then high.cardinality.identify.enable property will be disabled during data load. -### Example: + ### Example: ``` LOAD DATA local inpath '/opt/rawdata/data.csv' INTO table carbontable @@ -164,6 +164,43 @@ options('DELIMITER'=',', 'QUOTECHAR'='"','COMMENTCHAR'='#', ) ``` +- **BAD RECORDS HANDLING:** Methods of handling bad records are as follows: + + * Load all of the data before dealing with the errors. + + * Clean or delete bad records before loading data or stop the loading when bad records are found. + + ``` + OPTIONS('BAD_RECORDS_LOGGER_ENABLE'='true', 'BAD_RECORD_PATH'='hdfs://hacluster/tmp/carbon', 'BAD_RECORDS_ACTION'='REDIRECT', 'IS_EMPTY_DATA_BAD_RECORD'='false') + ``` + + NOTE: + + * If the REDIRECT option is used, Carbon will add all bad records in to a separate CSV file. However, this file must not be used for subsequent data loading because the content may not exactly match the source record. You are advised to cleanse the original source record for further data ingestion. This option is used to remind you which records are bad records. + + * In loaded data, if all records are bad records, the BAD_RECORDS_ACTION is invalid and the load operation fails. + + * The maximum number of characters per column is 100000. If there are more than 100000 characters in a column, data loading will fail. + +### Example: + +``` +LOAD DATA INPATH 'filepath.csv' +INTO TABLE tablename +OPTIONS('BAD_RECORDS_LOGGER_ENABLE'='true', +'BAD_RECORD_PATH'='hdfs://hacluster/tmp/carbon', +'BAD_RECORDS_ACTION'='REDIRECT', +'IS_EMPTY_DATA_BAD_RECORD'='false'); +``` + + **Bad Records Management Options:** + + | Options | Default Value | Description | + |---------------------------|---------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| + | BAD_RECORDS_LOGGER_ENABLE | false | Whether to create logs with details about bad records. | + | BAD_RECORDS_ACTION | FAIL | Following are the four types of action for bad records: FORCE: Auto-corrects the data by storing the bad records as NULL. REDIRECT: Bad records are written to the raw CSV instead of being loaded. IGNORE: Bad records are neither loaded nor written to the raw CSV. FAIL: Data loading fails if any bad records are found. NOTE: In loaded data, if all records are bad records, the BAD_RECORDS_ACTION is invalid and the load operation fails. | + | IS_EMPTY_DATA_BAD_RECORD | false | If false, then empty ("" or '' or ,,) data will not be considered as bad record and vice versa. | + | BAD_RECORD_PATH | - | Specifies the HDFS path where bad records are stored. By default the value is Null. This path must to be configured by the user if bad record logger is enabled or bad record action redirect. | ## INSERT DATA INTO A CARBONDATA TABLE
