Re: data lost when loading data from csv file to carbon table

Ravindra Pesala Wed, 15 Feb 2017 19:45:06 -0800

Please make 'use_kettle'='false' and try to run.

Regards,
Ravindra


On 16 February 2017 at 08:44, Yinwei Li <251469...@qq.com> wrote:

> thx Ravindra.
>
>
> I've run the script as:
>
>
> scala> import org.apache.carbondata.core.util.CarbonProperties
> scala> CarbonProperties.getInstance().addProperty("carbon.
> badRecords.location","hdfs://master:9000/data/carbondata/badrecords/")
> scala> val carbon = SparkSession.builder().config(sc.getConf).
> getOrCreateCarbonSession("hdfs://master:9000/opt/carbonStore")
> scala> carbon.sql(s"load data inpath '$src/web_sales.csv' into table
> _1g.web_sales OPTIONS('DELIMITER'='|','bad_records_logger_enable'='true',
> 'use_kettle'='true')")
>
>
>
> but it occured an Exception: java.lang.RuntimeException:
> carbon.kettle.home is not set
>
>
> the configuration in my carbon.properties is:
> carbon.kettle.home=/opt/spark-2.1.0/carbonlib/carbonplugins, but it seems
> not work.
>
>
> how can I solve this problem.
>
>
> ------
>
>
> Hi Liang Chen,
>
>
>     would you add a more detail document about the badRecord shows us how
> to use it, thx~~
>
>
>
>
>
>
>
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "Ravindra Pesala";<ravi.pes...@gmail.com>;
> 发送时间: 2017年2月15日(星期三) 中午11:36
> 收件人: "dev"<dev@carbondata.incubator.apache.org>;
>
> 主题: Re: data lost when loading data from csv file to carbon table
>
>
>
> Hi,
>
> I guess you are using spark-shell, so better set bad record location to
> CarbonProperties class before creating carbon session like below.
>
> CarbonProperties.getInstance().addProperty("carbon.
> badRecords.location","<bad
> record location>").
>
>
> 1. And while loading data you need to enable bad record logging as below.
>
> carbon.sql(s"load data inpath '$src/web_sales.csv' into table _1g.web_sales
> OPTIONS('DELIMITER'='|','bad_records_logger_enable'='true', 'use_kettle
> '='true')").
>
> Please check the bad records which are added to that bad record location.
>
>
> 2. You can alternatively verify by ignoring the bad records by using
> following command
> carbon.sql(s"load data inpath '$src/web_sales.csv' into table _1g.web_sales
> OPTIONS('DELIMITER'='|','bad_records_logger_enable'='true',
> 'bad_records_action'='ignore')").
>
> Regards,
> Ravindra.
>
> On 15 February 2017 at 07:37, Yinwei Li <251469...@qq.com> wrote:
>
> > Hi,
> >
> >
> >     I've set the properties as:
> >
> >
> >     carbon.badRecords.location=hdfs://localhost:9000/data/
> > carbondata/badrecords
> >
> >
> >     and add 'bad_records_action'='force' when loading data as:
> >
> >
> >     carbon.sql(s"load data inpath '$src/web_sales.csv' into table
> > _1g.web_sales OPTIONS('DELIMITER'='|','bad_records_action'='force')")
> >
> >
> >     but the configurations seems not work as there are no path or file
> > created under the path hdfs://localhost:9000/data/carbondata/badrecords.
> >
> >
> >     here are the way I created carbonContext:
> >
> >
> >     import org.apache.spark.sql.SparkSession
> >     import org.apache.spark.sql.CarbonSession._
> >     import org.apache.spark.sql.catalyst.util._
> >     val carbon = SparkSession.builder().config(sc.getConf).
> > getOrCreateCarbonSession("hdfs://master:9000/opt/carbonStore")
> >
> >
> >
> >
> >     and the following are bad record logs:
> >
> >
> >     INFO  15-02 09:43:24,393 - [Executor task launch
> > worker-0][partitionID:_1g_web_sales_d59af854-773c-429c-b7e6-
> 031d602fe2be]
> > Total copy time (ms) to copy file /tmp/1039730591739247/0/_1g/
> > web_sales/Fact/Part0/Segment_0/0/0-0-1487122995007.carbonindex is 65
> >     ERROR 15-02 09:43:24,393 - [Executor task launch
> > worker-0][partitionID:_1g_web_sales_d59af854-773c-429c-b7e6-
> 031d602fe2be]
> > Data Load is partially success for table web_sales
> >     INFO  15-02 09:43:24,393 - Bad Record Found
> >
> >
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "Ravindra Pesala";<ravi.pes...@gmail.com>;
> > 发送时间: 2017年2月14日(星期二) 晚上10:41
> > 收件人: "dev"<dev@carbondata.incubator.apache.org>;
> >
> > 主题: Re: data lost when loading data from csv file to carbon table
> >
> >
> >
> > Hi,
> >
> > Please set carbon.badRecords.location in carbon.properties and check any
> > bad records are added to that location.
> >
> >
> > Regards,
> > Ravindra.
> >
> > On 14 February 2017 at 15:24, Yinwei Li <251469...@qq.com> wrote:
> >
> > > Hi all,
> > >
> > >
> > >   I met an data lost problem when loading data from csv file to carbon
> > > table, here are some details:
> > >
> > >
> > >   Env: Spark 2.1.0 + Hadoop 2.7.2 + CarbonData 1.0.0
> > >   Total Records:719,384
> > >   Loaded Records:606,305 (SQL: select count(1) from table)
> > >
> > >
> > >   My Attemps:
> > >
> > >
> > >     Attemp1: Add option bad_records_action='force' when loading data.
> It
> > > also doesn't work, it's count equals to 606,305;
> > >     Attemp2: Cut line 1 to 300,000 into a csv file and load, the result
> > is
> > > right, which equals to 300,000;
> > >     Attemp3: Cut line 1 to 350,000 into a csv file and load, the result
> > is
> > > wrong, it equals to 305,631;
> > >     Attemp4: Cut line 300,000 to 350,000 into a csv file and load, the
> > > result is right, it equals to 50,000;
> > >     Attemp5: Count the separator '|' of my csv file, it equals to
> lines *
> > > columns,  so the source data may in the correct format;
> > >
> > >
> > >     In spark log, each attemp logs out : "Bad Record Found".
> > >
> > >
> > >     Anyone have any ideas?
> >
> >
> >
> >
> > --
> > Thanks & Regards,
> > Ravi
> >
>
>
>
> --
> Thanks & Regards,
> Ravi
>



-- 
Thanks & Regards,
Ravi

Re: data lost when loading data from csv file to carbon table

Reply via email to