[ 
https://issues.apache.org/jira/browse/KYLIN-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16122999#comment-16122999
 ] 

Alexander Sterligov commented on KYLIN-2749:
--------------------------------------------

No, my core nodes are on-demand. 
My configuration was:
  - master is on-demand m4.2xlarge, I rised kylin Xmx to 16Gb.
  - 5 on-demand core nodes m4.xlarge
  - from 0 to 50 task nodes of spot instances m4.xlarge with auto-scaling by 
ContainerPendingRatio

Amazon starts regions servers on task nodes and if it's spot instances it work 
really bad. I had a lot of issues with lost information. Then I fixed number of 
task nodes to 20 and it worked much better.

Finally I moved hbase to separate cluster with on-demand nodes only and I use 
another cluster with spot-instances for cube build. It works good and I don's 
see such issues anymore. (The only issue is that hfiles are not placed well to 
s3).

I've seen an email at user mailing list about best practices running kylin in 
clouds. It looks like on-demand instances for hbase regionservers is a must 
have.

I think this ticket may be closed as not a bug of kylin. Or you can replace 
closeQuietly by some other helper with logging. That's the only place where 
exceptions may be swallowed.


> Merge may remove old segments without saving merged segment
> -----------------------------------------------------------
>
>                 Key: KYLIN-2749
>                 URL: https://issues.apache.org/jira/browse/KYLIN-2749
>             Project: Kylin
>          Issue Type: Bug
>    Affects Versions: v2.0.0
>            Reporter: Alexander Sterligov
>         Attachments: kylin.log.2017-07-22.tar
>
>
> Merge started to work on last 7 segments.
> During the process hbase had a failure because of spot-instances shutdown in 
> Amazon. Data was not lost, because it is at S3.
> I stopped kylin and did hbase hbck --repair. During the report of repair I 
> didn't see any information about lost data, just redistribution of regions.
> Then after kylin was started I cannot query data from the last  7 segments:
> {quote}
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hbase.TableNotFoundException: Table 'KYLIN_7MMHCHKVVB' was 
> not found, got: KYLIN_7H3WSPX1UJ.
>         at com.google.common.base.Throwables.propagate(Throwables.java:160)
>         at 
> org.apache.kylin.storage.hbase.cube.v2.ExpectedSizeIterator.next(ExpectedSizeIterator.java:67)
>         at 
> org.apache.kylin.storage.hbase.cube.v2.ExpectedSizeIterator.next(ExpectedSizeIterator.java:31)
>         at 
> com.google.common.collect.TransformedIterator.next(TransformedIterator.java:48)
>         at com.google.common.collect.Iterators$6.hasNext(Iterators.java:583)
>         at 
> org.apache.kylin.storage.gtrecord.SegmentCubeTupleIterator$2.hasNext(SegmentCubeTupleIterator.java:116)
>         at 
> org.apache.kylin.storage.gtrecord.SegmentCubeTupleIterator.hasNext(SegmentCubeTupleIterator.java:149)
>         at com.google.common.collect.Iterators$6.hasNext(Iterators.java:582)
>         at 
> org.apache.kylin.storage.gtrecord.SequentialCubeTupleIterator.hasNext(SequentialCubeTupleIterator.java:129)
>         at 
> org.apache.kylin.query.enumerator.OLAPEnumerator.moveNext(OLAPEnumerator.java:67)
>         at Baz$1$1.moveNext(Unknown Source)
>         at 
> org.apache.calcite.linq4j.EnumerableDefaults.groupBy_(EnumerableDefaults.java:826)
>         at 
> org.apache.calcite.linq4j.EnumerableDefaults.groupBy(EnumerableDefaults.java:761)
>         at 
> org.apache.calcite.linq4j.DefaultEnumerable.groupBy(DefaultEnumerable.java:302)
>         at Baz.bind(Unknown Source)
>         at 
> org.apache.calcite.jdbc.CalcitePrepare$CalciteSignature.enumerable(CalcitePrepare.java:331)
>         at 
> org.apache.calcite.jdbc.CalciteConnectionImpl.enumerable(CalciteConnectionImpl.java:294)
>         at 
> org.apache.calcite.jdbc.CalciteMetaImpl._createIterable(CalciteMetaImpl.java:553)
>         at 
> org.apache.calcite.jdbc.CalciteMetaImpl.createIterable(CalciteMetaImpl.java:544)
>         at 
> org.apache.calcite.avatica.AvaticaResultSet.execute(AvaticaResultSet.java:193)
>         at 
> org.apache.calcite.jdbc.CalciteResultSet.execute(CalciteResultSet.java:67)
>         at 
> org.apache.calcite.jdbc.CalciteResultSet.execute(CalciteResultSet.java:44)
>         at 
> org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:607)
>         at 
> org.apache.calcite.jdbc.CalciteMetaImpl.prepareAndExecute(CalciteMetaImpl.java:600)
>         at 
> org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:615)
>         at 
> org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:148)
>         ... 77 more
> Caused by: org.apache.hadoop.hbase.TableNotFoundException: Table 
> 'KYLIN_7MMHCHKVVB' was not found, got: KYLIN_7H3WSPX1UJ.
>         at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1310)
>         at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1189)
>         at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1173)
>         at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1130)
>         at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getRegionLocation(ConnectionManager.java:965)
>         at 
> org.apache.hadoop.hbase.client.HRegionLocator.getRegionLocation(HRegionLocator.java:83)
>         at 
> org.apache.hadoop.hbase.client.HTable.getRegionLocation(HTable.java:505)
>         at 
> org.apache.hadoop.hbase.client.HTable.getKeysAndRegionsInRange(HTable.java:721)
>         at 
> org.apache.hadoop.hbase.client.HTable.getKeysAndRegionsInRange(HTable.java:691)
>         at 
> org.apache.hadoop.hbase.client.HTable.getStartKeysInRange(HTable.java:1796)
>         at 
> org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1751)
>         at 
> org.apache.kylin.storage.hbase.cube.v2.CubeHBaseEndpointRPC$1.run(CubeHBaseEndpointRPC.java:182)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         ... 1 more
> {quote}
> hbase really doesn't contain tables for the last 7 segments and I didn't call 
> any cleanup jobs.
> It looks like Merge removed old segment tables before merged segment was 
> saved.
> I'm going to continue to investigate this problem and will post more details 
> next week.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to