If "rowkey_stats" wasn't found, Kylin should throw exception and exit, instead of using 1 region silently; I'm going to change this, please let me know if you don't agree.
2015-09-11 10:17 GMT+08:00 Yerui Sun <[email protected]>: > Hi, yu feng, > Let me guess the reason of your problem. > > The num of reducers of converting hfile job depends on the region > numbers of corresponding HTable. > > For now, all HTables were created with only one region, caused by the > wrong path of rowkey_stats. I’ve opened a jira for this issue: > https://issues.apache.org/jira/browse/KYLIN-968. The patch has been > available last night. > > Here’s some clues to confirm my guessing: > 1. You can find the corresponding HTable name in log, check its regions, > it should have only one region. > 2. Check your kylin working directory on hdfs, there should be a path > like ‘../kylin-null/../rowkey_stats'. > 3. Grep your kylin.log in tomcat dir, you should find the log contains > ‘no region split, HTable will be one region’. > > If you hit all the three clues, I think KYLIN-968 could resolve your > problem. > > > > 在 2015年9月11日,00:54,yu feng <[email protected]> 写道: > > > > OK, I find another problem(I am a problem maker, ^_^), today I buid this > > cube which has 15 dimensions(one Mandatory dimension, to hierarchy > > dimension and others are normal dimension), I find cuboid files are > 1.9TB, > > in the step of converting cuboid to hfile it is too slow. I check the log > > of this job and find there are 9000+ mappers and only one reducer. > > > > I discard this job when our hadoop administrator tells me the node witch > > run this reducer is out of space of disk. I have to stop it, I am doubt > > that why there are only one reducer(I do not check source code of this > > job), By the way, my original data is only hundreds MB. I think this > would > > cause more problems if original is bigger or dimension is much more.. > > > > 2015-09-10 23:46 GMT+08:00 Luke Han <[email protected]>: > > > >> The 2.0 will not come recently, there are huge refactor and bunch of new > >> features, we have to make sure there are no critical bugs before > release. > >> > >> The same function also available under v1.x branch, please stay tuned > for > >> update information for that. > >> > >> Thanks. > >> > >> > >> Best Regards! > >> --------------------- > >> > >> Luke Han > >> > >> On Thu, Sep 10, 2015 at 7:50 PM, yu feng <[email protected]> wrote: > >> > >>> What good news ! I wish you can release the version as quickly as > >>> possible, Today, I build a cube whose cuboid files is 1.9TB. If we > merge > >>> cube based on cuboid files, I think it will be very slowly.. > >>> > >>> 2015-09-10 19:34 GMT+08:00 Shi, Shaofeng <[email protected]>: > >>> > >>>> We have implemented the merge from HTable directly in Kylin 2.0, which > >>>> hasn’t been released/announced. > >>>> > >>>> On 9/10/15, 7:22 PM, "yu feng" <[email protected]> wrote: > >>>> > >>>>> I think kylin can finish merging just depend on tables on hbase, This > >>> will > >>>>> make merging cubes more quickly, Isn't it ? > >>>>> > >>>>> 2015-09-10 19:16 GMT+08:00 yu feng <[email protected]>: > >>>>> > >>>>>> After check source code, I find you are right, cuboid files will be > >>> used > >>>>>> while merging segments, But a new question comes, Why kylin merge > >>>>>> segment > >>>>>> just based on hfile, I can not find how to take hbase table as input > >>>>>> format > >>>>>> of mapreduce job, But kylin take HFileOutputFormat as output format > >>>>>> while > >>>>>> changing cuboid to hfile. > >>>>>> > >>>>>> From this, I find kylin will take more space for a cube actually , > >> not > >>>>>> only hfile but also cuboid files, the former are used for query and > >>> the > >>>>>> latter are used for merge, and the capacity of cuboid files is > >> bigger > >>>>>> than > >>>>>> hfiles. > >>>>>> > >>>>>> I think we could do some thing to optimize it... I want to know your > >>>>>> opinions about it . > >>>>>> > >>>>>> 2015-09-10 18:36 GMT+08:00 Yerui Sun <[email protected]>: > >>>>>> > >>>>>>> Hi, yu feng, > >>>>>>> I’ve also noticed these files and opened a jira: > >>>>>>> https://issues.apache.org/jira/browse/KYLIN-978, and I’ll post a > >>> patch > >>>>>>> tonight. > >>>>>>> > >>>>>>> Here’s my opinions on your three question, feel free to correct > >> me: > >>>>>>> > >>>>>>> First, the data path of intermediate hive table should be deleted > >>>>>>> after > >>>>>>> building, I agreed with that. > >>>>>>> > >>>>>>> Second, the cuboid files will be used for merge and will be > >> deleted > >>>>>>> when merging job completed, we need and must leave them on hdfs. > >> The > >>>>>>> fact_distint_columns should be deleted. In additionally, the path > >> of > >>>>>>> rowkey_stats and hfile > >>>>>>> should also be deleted. > >>>>>>> > >>>>>>> Third, there’s no garbage collection steps if a job discard, > >> maybe > >>> we > >>>>>>> need a patch for this. > >>>>>>> > >>>>>>> > >>>>>>> Short answer: > >>>>>>> KYLIN-978 will clean all hdfs path except cuboid files after > >>> buildJob > >>>>>>> and mergeJob completed. > >>>>>>> The hdfs path will not be cleanup if a job was discarded, we need > >>>>>>> improvement on this. > >>>>>>> > >>>>>>> > >>>>>>> Best Regards, > >>>>>>> Yerui Sun > >>>>>>> [email protected] > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>> 在 2015年9月10日,18:20,yu feng <[email protected]> 写道: > >>>>>>>> > >>>>>>>> I see this core Improvement in release 1.0, JIRA url : > >>>>>>>> https://issues.apache.org/jira/browse/KYLIN-926 > >>>>>>>> > >>>>>>>> However, after my test and check the source code , I find some > >>>>>>> rubbish(I am not > >>>>>>>> sure) file in HDFS. > >>>>>>>> > >>>>>>>> First, kylin only drop the Intermediate table in hive, but the > >>> table > >>>>>>> is > >>>>>>> an > >>>>>>>> EXTERNAL table, the file still exist in kylin tmp directory in > >>> HDFS(I > >>>>>>> check > >>>>>>>> that..) > >>>>>>>> > >>>>>>>> Second, the cuboid files take a large space in HDFS, and kylin do > >>> not > >>>>>>>> delete after the cube build(fact_distinct_columns files exist > >> too). > >>>>>>> I am > >>>>>>>> not sure if those has other effects, remind me please if it has.. > >>>>>>>> > >>>>>>>> Third, After I discard a job, I think kylin should delete the > >>>>>>> Intermediate > >>>>>>>> files and drop Intermediate hive table, even though delete > >>>>>>>> them asynchronous. I think those data do not have any > >>>>>>> effects..remind me > >>>>>>>> please if it has.. > >>>>>>>> > >>>>>>>> These are rubbish datas still exist in current > >> version(kylin-1.0), > >>>>>>> please > >>>>>>>> check, thanks.. > >>>>>>> > >>>>>>> > >>>>>> > >>>> > >>>> > >>> > >> > >
