Please make 'use_kettle'='false' and try to run. Regards, Ravindra
On 16 February 2017 at 08:44, Yinwei Li <251469...@qq.com> wrote: > thx Ravindra. > > > I've run the script as: > > > scala> import org.apache.carbondata.core.util.CarbonProperties > scala> CarbonProperties.getInstance().addProperty("carbon. > badRecords.location","hdfs://master:9000/data/carbondata/badrecords/") > scala> val carbon = SparkSession.builder().config(sc.getConf). > getOrCreateCarbonSession("hdfs://master:9000/opt/carbonStore") > scala> carbon.sql(s"load data inpath '$src/web_sales.csv' into table > _1g.web_sales OPTIONS('DELIMITER'='|','bad_records_logger_enable'='true', > 'use_kettle'='true')") > > > > but it occured an Exception: java.lang.RuntimeException: > carbon.kettle.home is not set > > > the configuration in my carbon.properties is: > carbon.kettle.home=/opt/spark-2.1.0/carbonlib/carbonplugins, but it seems > not work. > > > how can I solve this problem. > > > ------ > > > Hi Liang Chen, > > > would you add a more detail document about the badRecord shows us how > to use it, thx~~ > > > > > > > > > > > ------------------ 原始邮件 ------------------ > 发件人: "Ravindra Pesala";<ravi.pes...@gmail.com>; > 发送时间: 2017年2月15日(星期三) 中午11:36 > 收件人: "dev"<dev@carbondata.incubator.apache.org>; > > 主题: Re: data lost when loading data from csv file to carbon table > > > > Hi, > > I guess you are using spark-shell, so better set bad record location to > CarbonProperties class before creating carbon session like below. > > CarbonProperties.getInstance().addProperty("carbon. > badRecords.location","<bad > record location>"). > > > 1. And while loading data you need to enable bad record logging as below. > > carbon.sql(s"load data inpath '$src/web_sales.csv' into table _1g.web_sales > OPTIONS('DELIMITER'='|','bad_records_logger_enable'='true', 'use_kettle > '='true')"). > > Please check the bad records which are added to that bad record location. > > > 2. You can alternatively verify by ignoring the bad records by using > following command > carbon.sql(s"load data inpath '$src/web_sales.csv' into table _1g.web_sales > OPTIONS('DELIMITER'='|','bad_records_logger_enable'='true', > 'bad_records_action'='ignore')"). > > Regards, > Ravindra. > > On 15 February 2017 at 07:37, Yinwei Li <251469...@qq.com> wrote: > > > Hi, > > > > > > I've set the properties as: > > > > > > carbon.badRecords.location=hdfs://localhost:9000/data/ > > carbondata/badrecords > > > > > > and add 'bad_records_action'='force' when loading data as: > > > > > > carbon.sql(s"load data inpath '$src/web_sales.csv' into table > > _1g.web_sales OPTIONS('DELIMITER'='|','bad_records_action'='force')") > > > > > > but the configurations seems not work as there are no path or file > > created under the path hdfs://localhost:9000/data/carbondata/badrecords. > > > > > > here are the way I created carbonContext: > > > > > > import org.apache.spark.sql.SparkSession > > import org.apache.spark.sql.CarbonSession._ > > import org.apache.spark.sql.catalyst.util._ > > val carbon = SparkSession.builder().config(sc.getConf). > > getOrCreateCarbonSession("hdfs://master:9000/opt/carbonStore") > > > > > > > > > > and the following are bad record logs: > > > > > > INFO 15-02 09:43:24,393 - [Executor task launch > > worker-0][partitionID:_1g_web_sales_d59af854-773c-429c-b7e6- > 031d602fe2be] > > Total copy time (ms) to copy file /tmp/1039730591739247/0/_1g/ > > web_sales/Fact/Part0/Segment_0/0/0-0-1487122995007.carbonindex is 65 > > ERROR 15-02 09:43:24,393 - [Executor task launch > > worker-0][partitionID:_1g_web_sales_d59af854-773c-429c-b7e6- > 031d602fe2be] > > Data Load is partially success for table web_sales > > INFO 15-02 09:43:24,393 - Bad Record Found > > > > > > > > > > ------------------ 原始邮件 ------------------ > > 发件人: "Ravindra Pesala";<ravi.pes...@gmail.com>; > > 发送时间: 2017年2月14日(星期二) 晚上10:41 > > 收件人: "dev"<dev@carbondata.incubator.apache.org>; > > > > 主题: Re: data lost when loading data from csv file to carbon table > > > > > > > > Hi, > > > > Please set carbon.badRecords.location in carbon.properties and check any > > bad records are added to that location. > > > > > > Regards, > > Ravindra. > > > > On 14 February 2017 at 15:24, Yinwei Li <251469...@qq.com> wrote: > > > > > Hi all, > > > > > > > > > I met an data lost problem when loading data from csv file to carbon > > > table, here are some details: > > > > > > > > > Env: Spark 2.1.0 + Hadoop 2.7.2 + CarbonData 1.0.0 > > > Total Records:719,384 > > > Loaded Records:606,305 (SQL: select count(1) from table) > > > > > > > > > My Attemps: > > > > > > > > > Attemp1: Add option bad_records_action='force' when loading data. > It > > > also doesn't work, it's count equals to 606,305; > > > Attemp2: Cut line 1 to 300,000 into a csv file and load, the result > > is > > > right, which equals to 300,000; > > > Attemp3: Cut line 1 to 350,000 into a csv file and load, the result > > is > > > wrong, it equals to 305,631; > > > Attemp4: Cut line 300,000 to 350,000 into a csv file and load, the > > > result is right, it equals to 50,000; > > > Attemp5: Count the separator '|' of my csv file, it equals to > lines * > > > columns, so the source data may in the correct format; > > > > > > > > > In spark log, each attemp logs out : "Bad Record Found". > > > > > > > > > Anyone have any ideas? > > > > > > > > > > -- > > Thanks & Regards, > > Ravi > > > > > > -- > Thanks & Regards, > Ravi > -- Thanks & Regards, Ravi