BTW, actually you can treat v1.0 as 0.7.3; it is compatible with 0.7.x; Suggest you upgrade to v1.1 directly, which will be released soon and includes a couple of bug fixes and performance enhancements;
2015-09-30 9:05 GMT+08:00 Shi, Shaofeng <[email protected]>: > for v1.0 or before, please refer to this doc to do manual cleanup: > > https://kylin.incubator.apache.org/docs/howto/howto_cleanup_storage.html > > > > On 9/30/15, 9:00 AM, "Luke Han" <[email protected]> wrote: > > >Hi Abhilash, > > I would like to recommend to upgrade to v1.0 or v1.1 (is under > >releasing > >process now). > > > > Thanks. > >Luke > > > > > >Best Regards! > >--------------------- > > > >Luke Han > > > >On Wed, Sep 30, 2015 at 12:46 AM, Abhilash L L <[email protected]> > >wrote: > > > >> Hello, > >> > >> We observered that purging and dropping a cube is not deleting > >> dictionaries / snapshots and also not dropping the table in hbase. > >> > >> Also, its leaving a lot of temporary data in hdfs > >> > >> We are on 0.7.2. I hope it is being fixed shortly and on priority. > >> > >> I saw that the ticket has been fixed on v1.1 and v2. Can this be > >>back > >> ported to 0.7.2 > >> > >> > >> Regards, > >> Abhilash > >> > >> On Fri, Sep 18, 2015 at 11:02 AM, yu feng <[email protected]> wrote: > >> > >> > After build another cube successfully, I recheck this bug and find the > >> > reason, thanks to all of you ... > >> > > >> > 2015-09-11 11:17 GMT+08:00 ShaoFeng Shi <[email protected]>: > >> > > >> > > If "rowkey_stats" wasn't found, Kylin should throw exception and > >>exit, > >> > > instead of using 1 region silently; I'm going to change this, please > >> let > >> > me > >> > > know if you don't agree. > >> > > > >> > > 2015-09-11 10:17 GMT+08:00 Yerui Sun <[email protected]>: > >> > > > >> > > > Hi, yu feng, > >> > > > Let me guess the reason of your problem. > >> > > > > >> > > > The num of reducers of converting hfile job depends on the > >>region > >> > > > numbers of corresponding HTable. > >> > > > > >> > > > For now, all HTables were created with only one region, caused > >>by > >> the > >> > > > wrong path of rowkey_stats. I’ve opened a jira for this issue: > >> > > > https://issues.apache.org/jira/browse/KYLIN-968. The patch has > >>been > >> > > > available last night. > >> > > > > >> > > > Here’s some clues to confirm my guessing: > >> > > > 1. You can find the corresponding HTable name in log, check its > >> > > regions, > >> > > > it should have only one region. > >> > > > 2. Check your kylin working directory on hdfs, there should be a > >> path > >> > > > like ‘../kylin-null/../rowkey_stats'. > >> > > > 3. Grep your kylin.log in tomcat dir, you should find the log > >> > contains > >> > > > ‘no region split, HTable will be one region’. > >> > > > > >> > > > If you hit all the three clues, I think KYLIN-968 could resolve > >> your > >> > > > problem. > >> > > > > >> > > > > >> > > > > 在 2015年9月11日,00:54,yu feng <[email protected]> 写道: > >> > > > > > >> > > > > OK, I find another problem(I am a problem maker, ^_^), today I > >>buid > >> > > this > >> > > > > cube which has 15 dimensions(one Mandatory dimension, to > >>hierarchy > >> > > > > dimension and others are normal dimension), I find cuboid files > >>are > >> > > > 1.9TB, > >> > > > > in the step of converting cuboid to hfile it is too slow. I > >>check > >> the > >> > > log > >> > > > > of this job and find there are 9000+ mappers and only one > >>reducer. > >> > > > > > >> > > > > I discard this job when our hadoop administrator tells me the > >>node > >> > > witch > >> > > > > run this reducer is out of space of disk. I have to stop it, I > >>am > >> > doubt > >> > > > > that why there are only one reducer(I do not check source code > >>of > >> > this > >> > > > > job), By the way, my original data is only hundreds MB. I think > >> this > >> > > > would > >> > > > > cause more problems if original is bigger or dimension is much > >> more.. > >> > > > > > >> > > > > 2015-09-10 23:46 GMT+08:00 Luke Han <[email protected]>: > >> > > > > > >> > > > >> The 2.0 will not come recently, there are huge refactor and > >>bunch > >> of > >> > > new > >> > > > >> features, we have to make sure there are no critical bugs > >>before > >> > > > release. > >> > > > >> > >> > > > >> The same function also available under v1.x branch, please stay > >> > tuned > >> > > > for > >> > > > >> update information for that. > >> > > > >> > >> > > > >> Thanks. > >> > > > >> > >> > > > >> > >> > > > >> Best Regards! > >> > > > >> --------------------- > >> > > > >> > >> > > > >> Luke Han > >> > > > >> > >> > > > >> On Thu, Sep 10, 2015 at 7:50 PM, yu feng <[email protected] > > > >> > > wrote: > >> > > > >> > >> > > > >>> What good news ! I wish you can release the version as > >>quickly > >> as > >> > > > >>> possible, Today, I build a cube whose cuboid files is 1.9TB. > >>If > >> we > >> > > > merge > >> > > > >>> cube based on cuboid files, I think it will be very slowly.. > >> > > > >>> > >> > > > >>> 2015-09-10 19:34 GMT+08:00 Shi, Shaofeng <[email protected]>: > >> > > > >>> > >> > > > >>>> We have implemented the merge from HTable directly in Kylin > >>2.0, > >> > > which > >> > > > >>>> hasn’t been released/announced. > >> > > > >>>> > >> > > > >>>> On 9/10/15, 7:22 PM, "yu feng" <[email protected]> wrote: > >> > > > >>>> > >> > > > >>>>> I think kylin can finish merging just depend on tables on > >> hbase, > >> > > This > >> > > > >>> will > >> > > > >>>>> make merging cubes more quickly, Isn't it ? > >> > > > >>>>> > >> > > > >>>>> 2015-09-10 19:16 GMT+08:00 yu feng <[email protected]>: > >> > > > >>>>> > >> > > > >>>>>> After check source code, I find you are right, cuboid files > >> will > >> > > be > >> > > > >>> used > >> > > > >>>>>> while merging segments, But a new question comes, Why kylin > >> > merge > >> > > > >>>>>> segment > >> > > > >>>>>> just based on hfile, I can not find how to take hbase > >>table as > >> > > input > >> > > > >>>>>> format > >> > > > >>>>>> of mapreduce job, But kylin take HFileOutputFormat as > >>output > >> > > format > >> > > > >>>>>> while > >> > > > >>>>>> changing cuboid to hfile. > >> > > > >>>>>> > >> > > > >>>>>> From this, I find kylin will take more space for a cube > >> > actually , > >> > > > >> not > >> > > > >>>>>> only hfile but also cuboid files, the former are used for > >> query > >> > > and > >> > > > >>> the > >> > > > >>>>>> latter are used for merge, and the capacity of cuboid > >>files is > >> > > > >> bigger > >> > > > >>>>>> than > >> > > > >>>>>> hfiles. > >> > > > >>>>>> > >> > > > >>>>>> I think we could do some thing to optimize it... I want to > >> know > >> > > your > >> > > > >>>>>> opinions about it . > >> > > > >>>>>> > >> > > > >>>>>> 2015-09-10 18:36 GMT+08:00 Yerui Sun <[email protected]>: > >> > > > >>>>>> > >> > > > >>>>>>> Hi, yu feng, > >> > > > >>>>>>> I’ve also noticed these files and opened a jira: > >> > > > >>>>>>> https://issues.apache.org/jira/browse/KYLIN-978, and I’ll > >> > post a > >> > > > >>> patch > >> > > > >>>>>>> tonight. > >> > > > >>>>>>> > >> > > > >>>>>>> Here’s my opinions on your three question, feel free to > >> > correct > >> > > > >> me: > >> > > > >>>>>>> > >> > > > >>>>>>> First, the data path of intermediate hive table should be > >> > > deleted > >> > > > >>>>>>> after > >> > > > >>>>>>> building, I agreed with that. > >> > > > >>>>>>> > >> > > > >>>>>>> Second, the cuboid files will be used for merge and will > >>be > >> > > > >> deleted > >> > > > >>>>>>> when merging job completed, we need and must leave them on > >> > hdfs. > >> > > > >> The > >> > > > >>>>>>> fact_distint_columns should be deleted. In additionally, > >>the > >> > path > >> > > > >> of > >> > > > >>>>>>> rowkey_stats and hfile > >> > > > >>>>>>> should also be deleted. > >> > > > >>>>>>> > >> > > > >>>>>>> Third, there’s no garbage collection steps if a job > >>discard, > >> > > > >> maybe > >> > > > >>> we > >> > > > >>>>>>> need a patch for this. > >> > > > >>>>>>> > >> > > > >>>>>>> > >> > > > >>>>>>> Short answer: > >> > > > >>>>>>> KYLIN-978 will clean all hdfs path except cuboid files > >>after > >> > > > >>> buildJob > >> > > > >>>>>>> and mergeJob completed. > >> > > > >>>>>>> The hdfs path will not be cleanup if a job was > >>discarded, we > >> > > need > >> > > > >>>>>>> improvement on this. > >> > > > >>>>>>> > >> > > > >>>>>>> > >> > > > >>>>>>> Best Regards, > >> > > > >>>>>>> Yerui Sun > >> > > > >>>>>>> [email protected] > >> > > > >>>>>>> > >> > > > >>>>>>> > >> > > > >>>>>>> > >> > > > >>>>>>>> 在 2015年9月10日,18:20,yu feng <[email protected]> 写道: > >> > > > >>>>>>>> > >> > > > >>>>>>>> I see this core Improvement in release 1.0, JIRA url : > >> > > > >>>>>>>> https://issues.apache.org/jira/browse/KYLIN-926 > >> > > > >>>>>>>> > >> > > > >>>>>>>> However, after my test and check the source code , I find > >> some > >> > > > >>>>>>> rubbish(I am not > >> > > > >>>>>>>> sure) file in HDFS. > >> > > > >>>>>>>> > >> > > > >>>>>>>> First, kylin only drop the Intermediate table in hive, > >>but > >> the > >> > > > >>> table > >> > > > >>>>>>> is > >> > > > >>>>>>> an > >> > > > >>>>>>>> EXTERNAL table, the file still exist in kylin tmp > >>directory > >> in > >> > > > >>> HDFS(I > >> > > > >>>>>>> check > >> > > > >>>>>>>> that..) > >> > > > >>>>>>>> > >> > > > >>>>>>>> Second, the cuboid files take a large space in HDFS, and > >> kylin > >> > > do > >> > > > >>> not > >> > > > >>>>>>>> delete after the cube build(fact_distinct_columns files > >> exist > >> > > > >> too). > >> > > > >>>>>>> I am > >> > > > >>>>>>>> not sure if those has other effects, remind me please if > >>it > >> > > has.. > >> > > > >>>>>>>> > >> > > > >>>>>>>> Third, After I discard a job, I think kylin should delete > >> the > >> > > > >>>>>>> Intermediate > >> > > > >>>>>>>> files and drop Intermediate hive table, even though > >>delete > >> > > > >>>>>>>> them asynchronous. I think those data do not have any > >> > > > >>>>>>> effects..remind me > >> > > > >>>>>>>> please if it has.. > >> > > > >>>>>>>> > >> > > > >>>>>>>> These are rubbish datas still exist in current > >> > > > >> version(kylin-1.0), > >> > > > >>>>>>> please > >> > > > >>>>>>>> check, thanks.. > >> > > > >>>>>>> > >> > > > >>>>>>> > >> > > > >>>>>> > >> > > > >>>> > >> > > > >>>> > >> > > > >>> > >> > > > >> > >> > > > > >> > > > > >> > > > >> > > >> > >
