[
https://issues.apache.org/jira/browse/KYLIN-4385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17092245#comment-17092245
]
ASF GitHub Bot commented on KYLIN-4385:
---------------------------------------
shaofengshi commented on a change in pull request #1173:
URL: https://github.com/apache/kylin/pull/1173#discussion_r415068769
##########
File path:
metrics-reporter-hive/src/main/java/org/apache/kylin/metrics/lib/impl/hive/HiveProducer.java
##########
@@ -97,7 +105,10 @@ public void onRemoval(RemovalNotification<Pair<String,
String>, Pair<String, Lis
} catch (UnknownHostException e) {
hostName = "UNKNOWN";
}
- CONTENT_FILE_NAME = hostName + "-part-0000";
+ CONTENT_FILE_NAME = hostName + "-part-";
+ String fsUri = fileSystem.getUri().toString();
+ supportAppend = !fsUri.startsWith("s3") && !fsUri.startsWith("wasb");
// AWS EMR and Azure HDInsight
Review comment:
may need to cover more cloud storage:
- adls (azure data lake)
- gs (google storage)
- oss (aliyun oss)
or, can we just treat "hdfs" as the append case, all others as non-append?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> KYLIN system cube failing to update table when run on EMR with S3 as storage
> and EMRFS
> --------------------------------------------------------------------------------------
>
> Key: KYLIN-4385
> URL: https://issues.apache.org/jira/browse/KYLIN-4385
> Project: Kylin
> Issue Type: Bug
> Reporter: raghu ram reddy
> Assignee: Xiaoxiang Yu
> Priority: Major
> Fix For: v3.1.0, v3.0.2, v2.6.6
>
>
>
> 2020-02-24T15:35:46,548 INFO [metrics-blocking-reservoir-scheduler-0]
> org.apache.kylin.metrics.lib.impl.hive.HiveReservoirReporter - Try to write
> 113 records2020-02-24T15:35:46,566 INFO
> [metrics-blocking-reservoir-scheduler-0] org.apache.hadoop.hive.conf.HiveConf
> - Found configuration file
> file:/etc/hive/conf.dist/hive-site.xml2020-02-24T15:35:47,097 INFO
> [metrics-blocking-reservoir-scheduler-0] hive.metastore - Trying to connect
> to metastore with URI
> thrift://ip-1-1-1-1.ec2.internal:90832020-02-24T15:35:47,216 INFO
> [metrics-blocking-reservoir-scheduler-0] hive.metastore - Opened a connection
> to metastore, current connections: 12020-02-24T15:35:47,216 INFO
> [metrics-blocking-reservoir-scheduler-0] hive.metastore - Connected to
> metastore.2020-02-24T15:35:47,433 INFO
> [metrics-blocking-reservoir-scheduler-0] hive.metastore - Closed a connection
> to metastore, current connections: 02020-02-24T15:35:47,824 INFO
> [metrics-blocking-reservoir-scheduler-0]
> org.apache.kylin.metrics.lib.impl.hive.HiveProducer - Try to use new
> partition content path:
> hdfs://ip-1-1-2-1.ec2.internal:8020/tmp/system_cube/hive_metrics_query_cube_qa/kday_date=2020-02-24/ip-1-1-1-1-1582558547056-part-0000
> for metric: METRICS_QUERY_CUBE_QA2020-02-24T15:35:47,959 INFO
> [metrics-blocking-reservoir-scheduler-0]
> org.apache.kylin.metrics.lib.impl.hive.HiveProducer - Success to write 37
> metrics (METRICS_QUERY_CUBE_QA) to file
> hdfs://ip-1-1-2-1.ec2.internal:8020/tmp/system_cube/hive_metrics_query_cube_qa/kday_date=2020-02-24/ip-1-1-1-1-1582558547056-part-00002020-02-24T15:35:48,275
> INFO [metrics-blocking-reservoir-scheduler-0] hive.metastore - Trying to
> connect to metastore with URI
> thrift://ip-1-1-2-1.ec2.internal:90832020-02-24T15:35:48,288 INFO
> [metrics-blocking-reservoir-scheduler-0] hive.metastore - Opened a connection
> to metastore, current connections: 12020-02-24T15:35:48,289 INFO
> [metrics-blocking-reservoir-scheduler-0] hive.metastore - Connected to
> metastore.2020-02-24T15:35:48,711 INFO
> [metrics-blocking-reservoir-scheduler-0] hive.metastore - Closed a connection
> to metastore, current connections: 02020-02-24T15:35:50,223 INFO
> [metrics-blocking-reservoir-scheduler-0]
> org.apache.hadoop.hive.ql.session.SessionState - Created HDFS directory:
> /tmp/hive/kylin/3f98a154-e471-40fc-9829-4c4283266d462020-02-24T15:35:50,224
> INFO [metrics-blocking-reservoir-scheduler-0]
> org.apache.hadoop.hive.ql.session.SessionState - Created local directory:
> /usr/local/kylin/tomcat/temp/kylin/3f98a154-e471-40fc-9829-4c4283266d462020-02-24T15:35:50,232
> INFO [metrics-blocking-reservoir-scheduler-0]
> org.apache.hadoop.hive.ql.session.SessionState - Created HDFS directory:
> /tmp/hive/kylin/3f98a154-e471-40fc-9829-4c4283266d46/_tmp_space.db2020-02-24T15:35:50,291
> INFO [metrics-blocking-reservoir-scheduler-0]
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState - User of session id
> 3f98a154-e471-40fc-9829-4c4283266d46 is kylin2020-02-24T15:35:50,389 INFO
> [metrics-blocking-reservoir-scheduler-0]
> org.apache.hadoop.hive.ql.exec.tez.DagUtils - Jar dir is null / directory
> doesn't exist. Choosing HIVE_INSTALL_DIR -
> /user/kylin/.hiveJars2020-02-24T15:35:50,933 INFO
> [metrics-blocking-reservoir-scheduler-0]
> org.apache.hadoop.hive.ql.exec.tez.DagUtils - Resource modification time:
> 1581024148854 for
> hdfs://ip-1-1-2-1.ec2.internal:8020/user/kylin/.hiveJars/hive-exec-2.3.6-amzn-0-9f4c4d2a9ab8330bfec9b3ce23e40355288cc5c08a20165b20aca86b2b6c2c95.jar2020-02-24T15:35:51,066
> INFO [metrics-blocking-reservoir-scheduler-0]
> org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAccessController
> - Created SQLStdHiveAccessController for session context :
> HiveAuthzSessionContext [sessionString=3f98a154-e471-40fc-9829-4c4283266d46,
> clientType=HIVECLI]2020-02-24T15:35:51,073 WARN
> [metrics-blocking-reservoir-scheduler-0]
> org.apache.hadoop.hive.ql.session.SessionState - METASTORE_FILTER_HOOK will
> be ignored, since hive.security.authorization.manager is set to instance of
> HiveAuthorizerFactory.2020-02-24T15:35:51,646 INFO
> [metrics-blocking-reservoir-scheduler-0] hive.metastore - Trying to connect
> to metastore with URI
> thrift://ip-1-1-2-1.ec2.internal:90832020-02-24T15:35:51,662 INFO
> [metrics-blocking-reservoir-scheduler-0] hive.metastore - Opened a connection
> to metastore, current connections: 12020-02-24T15:35:51,662 INFO
> [metrics-blocking-reservoir-scheduler-0] hive.metastore - Connected to
> metastore.2020-02-24T15:35:51,992 INFO
> [metrics-blocking-reservoir-scheduler-0] org.apache.tez.client.TezClient -
> Tez Client Version: [ component=tez-api, version=0.9.2,
> revision=9566b9ed1d86bc2697f1622e4e9825da6c011583,
> SCM-URL=scm:git:https://gitbox.apache.org/repos/asf/tez.git,
> buildTime=2019-10-28T16:32:03Z ]2020-02-24T15:35:51,992 INFO
> [metrics-blocking-reservoir-scheduler-0]
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState - Opening new Tez Session
> (id: 3f98a154-e471-40fc-9829-4c4283266d46, scratch dir:
> hdfs://ip-1-1-2-1.ec2.internal:8020/tmp/hive/kylin/_tez_session_dir/3f98a154-e471-40fc-9829-4c4283266d46)2020-02-24T15:35:52,578
> INFO [metrics-blocking-reservoir-scheduler-0]
> org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at
> ip-1-1-2-1.ec2.internal/10.127.2.141:80322020-02-24T15:35:52,767 INFO
> [metrics-blocking-reservoir-scheduler-0] org.apache.tez.client.TezClient -
> Session mode. Starting session.2020-02-24T15:35:52,839 INFO
> [metrics-blocking-reservoir-scheduler-0] org.apache.tez.client.TezClientUtils
> - Using tez.lib.uris value from configuration:
> hdfs:///apps/tez/tez.tar.gz2020-02-24T15:35:52,839 INFO
> [metrics-blocking-reservoir-scheduler-0] org.apache.tez.client.TezClientUtils
> - Using tez.lib.uris.classpath value from configuration:
> null2020-02-24T15:35:52,871 INFO [metrics-blocking-reservoir-scheduler-0]
> org.apache.hadoop.hdfs.DFSClient - Created HDFS_DELEGATION_TOKEN token 856
> for kylin on 10.127.2.141:80202020-02-24T15:35:53,280 INFO
> [metrics-blocking-reservoir-scheduler-0]
> org.apache.tez.common.security.TokenCache - Got dt for
> hdfs://ip-1-1-2-1.ec2.internal:8020; Kind: HDFS_DELEGATION_TOKEN, Service:
> 10.127.2.141:8020, Ident: (HDFS_DELEGATION_TOKEN token 856 for
> kylin)2020-02-24T15:35:53,280 INFO [metrics-blocking-reservoir-scheduler-0]
> org.apache.tez.common.security.TokenCache - Got dt for
> hdfs://ip-1-1-2-1.ec2.internal:8020; Kind: kms-dt, Service:
> 10.127.2.141:9700, Ident: (owner=kylin, renewer=yarn, realUser=,
> issueDate=1582558553105, maxDate=1583163353105, sequenceNumber=853,
> masterKeyId=53)2020-02-24T15:35:53,310 INFO
> [metrics-blocking-reservoir-scheduler-0] org.apache.tez.client.TezClient -
> Tez system stage directory
> hdfs://ip-1-1-2-1.ec2.internal:8020/tmp/hive/kylin/_tez_session_dir/3f98a154-e471-40fc-9829-4c4283266d46/.tez/application_1578089000827_0674
> doesn't exist and is created2020-02-24T15:35:54,257 INFO [BadQueryDetector]
> org.apache.kylin.rest.service.BadQueryDetector - Detect bad
> query.2020-02-24T15:35:54,620 INFO [metrics-blocking-reservoir-scheduler-0]
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service
> address:
> http://ip-1-1-2-1.ec2.internal:8188/ws/v1/timeline/2020-02-24T15:35:55,040
> INFO [metrics-blocking-reservoir-scheduler-0]
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application
> application_1578089000827_06742020-02-24T15:35:55,041 INFO
> [metrics-blocking-reservoir-scheduler-0] org.apache.tez.client.TezClient -
> The url to track the Tez Session:
> http://ip-1-1-2-1.ec2.internal:20888/proxy/application_1578089000827_0674/2020-02-24T15:35:57,000
> INFO [FetcherRunner 1354629870-25]
> org.apache.kylin.job.impl.threadpool.DefaultFetcherRunner - Job Fetcher: 0
> should running, 0 actual running, 0 stopped, 0 ready, 20 already succeed, 1
> error, 0 discarded, 0 others2020-02-24T15:35:59,829 INFO
> [metrics-blocking-reservoir-scheduler-0] org.apache.hadoop.hive.ql.Driver -
> Compiling
> command(queryId=kylin_20200224153550_249ef9e2-5723-403f-8ef9-e1a43de9b661):
> ALTER TABLE KYLIN.HIVE_METRICS_QUERY_QA ADD IF NOT EXISTS PARTITION
> (kday_date='2020-02-24')2020-02-24T15:36:01,467 INFO
> [metrics-blocking-reservoir-scheduler-0] org.apache.hadoop.hive.ql.Driver -
> Semantic Analysis Completed2020-02-24T15:36:01,471 INFO
> [metrics-blocking-reservoir-scheduler-0] org.apache.hadoop.hive.ql.Driver -
> Returning Hive schema: Schema(fieldSchemas:null,
> properties:null)2020-02-24T15:36:01,485 INFO
> [metrics-blocking-reservoir-scheduler-0] org.apache.hadoop.hive.ql.Driver -
> Completed compiling
> command(queryId=kylin_20200224153550_249ef9e2-5723-403f-8ef9-e1a43de9b661);
> Time taken: 1.708 seconds2020-02-24T15:36:01,485 INFO
> [metrics-blocking-reservoir-scheduler-0] org.apache.hadoop.hive.ql.Driver -
> Concurrency mode is disabled, not creating a lock
> manager2020-02-24T15:36:01,485 INFO [metrics-blocking-reservoir-scheduler-0]
> org.apache.hadoop.hive.ql.Driver - Executing
> command(queryId=kylin_20200224153550_249ef9e2-5723-403f-8ef9-e1a43de9b661):
> ALTER TABLE KYLIN.HIVE_METRICS_QUERY_QA ADD IF NOT EXISTS PARTITION
> (kday_date='2020-02-24')2020-02-24T15:36:01,506 INFO
> [metrics-blocking-reservoir-scheduler-0] org.apache.hadoop.hive.ql.Driver -
> Starting task [Stage-0:DDL] in serial mode2020-02-24T15:36:02,952 INFO
> [metrics-blocking-reservoir-scheduler-0] hive.ql.metadata.Hive - Dumping
> metastore api call timing information for : execution
> phase2020-02-24T15:36:02,952 INFO [metrics-blocking-reservoir-scheduler-0]
> hive.ql.metadata.Hive - Total time spent in this metastore function was
> greater than 1000ms : add_partitions_(List, boolean, boolean,
> )=11912020-02-24T15:36:02,952 INFO [metrics-blocking-reservoir-scheduler-0]
> org.apache.hadoop.hive.ql.Driver - Completed executing
> command(queryId=kylin_20200224153550_249ef9e2-5723-403f-8ef9-e1a43de9b661);
> Time taken: 1.467 secondsOK2020-02-24T15:36:02,953 INFO
> [metrics-blocking-reservoir-scheduler-0] org.apache.hadoop.hive.ql.Driver -
> OK2020-02-24T15:36:02,954 INFO [metrics-blocking-reservoir-scheduler-0]
> org.apache.kylin.metrics.lib.impl.hive.HiveProducer - Try to use new
> partition content path:
> s3://my_bucket/warehouse/kylin.db/hive_metrics_query_qa/kday_date=2020-02-24/ip-1-1-1-1-1582558548273-part-0000
> for metric: METRICS_QUERY_QA2020-02-24T15:36:03,322 INFO
> [metrics-blocking-reservoir-scheduler-0]
> com.amazon.ws.emr.hadoop.fs.cse.CSEMultipartUploadOutputStream - close
> closed:false
> s3://my_bucket/warehouse/kylin.db/hive_metrics_query_qa/kday_date=2020-02-24/ip-1-1-1-1-1582558548273-part-00002020-02-24T15:36:03,847
> INFO [metrics-blocking-reservoir-scheduler-0]
> com.amazon.ws.emr.hadoop.fs.s3.upload.dispatch.DefaultMultipartUploadDispatcher
> - Completed multipart upload of 1 parts 0 bytes2020-02-24T15:36:04,203 INFO
> [metrics-blocking-reservoir-scheduler-0]
> com.amazon.ws.emr.hadoop.fs.cse.CSEMultipartUploadOutputStream - Finished
> uploading
> my_bucket/warehouse/kylin.db/hive_metrics_query_qa/kday_date=2020-02-24/ip-1-1-1-1-1582558548273-part-0000.
> Elapsed seconds: 0.2020-02-24T15:36:04,284 ERROR
> [metrics-blocking-reservoir-scheduler-0]
> org.apache.kylin.metrics.lib.impl.hive.HiveReservoirReporter -
> nulljava.lang.UnsupportedOperationException at
> com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.append(S3NativeFileSystem2.java:150)
> ~[emrfs-hadoop-assembly-2.37.0.jar:?] at
> org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1181)
> ~[hadoop-common-2.8.5-amzn-5.jar:?] at
> com.amazon.ws.emr.hadoop.fs.EmrFileSystem.append(EmrFileSystem.java:295)
> ~[emrfs-hadoop-assembly-2.37.0.jar:?] at
> org.apache.kylin.metrics.lib.impl.hive.HiveProducer.write(HiveProducer.java:204)
> ~[kylin-metrics-reporter-hive-3.0.0.jar:3.0.0] at
> org.apache.kylin.metrics.lib.impl.hive.HiveProducer.send(HiveProducer.java:134)
> ~[kylin-metrics-reporter-hive-3.0.0.jar:3.0.0] at
> org.apache.kylin.metrics.lib.impl.hive.HiveReservoirReporter$HiveReservoirListener.onRecordUpdate(HiveReservoirReporter.java:144)
> [kylin-metrics-reporter-hive-3.0.0.jar:3.0.0] at
> org.apache.kylin.metrics.lib.impl.BlockingReservoir.notifyListenerOfUpdatedRecord(BlockingReservoir.java:117)
> [kylin-core-metrics-3.0.0.jar:3.0.0] at
> org.apache.kylin.metrics.lib.impl.BlockingReservoir.onRecordUpdate(BlockingReservoir.java:105)
> [kylin-core-metrics-3.0.0.jar:3.0.0] at
> org.apache.kylin.metrics.lib.impl.BlockingReservoir.access$300(BlockingReservoir.java:37)
> [kylin-core-metrics-3.0.0.jar:3.0.0] at
> org.apache.kylin.metrics.lib.impl.BlockingReservoir$ReporterRunnable.run(BlockingReservoir.java:171)
> [kylin-core-metrics-3.0.0.jar:3.0.0] at
> java.lang.Thread.run(Thread.java:748) [?:1.8.0_242]2020-02-24T15:36:04,290
> WARN [metrics-blocking-reservoir-scheduler-0]
> org.apache.kylin.metrics.lib.impl.BlockingReservoir - It fails to notify
> listener
> org.apache.kylin.metrics.lib.impl.hive.HiveReservoirReporter$HiveReservoirListener@1d460286
> of updated record size 1132
--
This message was sent by Atlassian Jira
(v8.3.4#803005)