[GitHub] [kylin] hit-lacus edited a comment on issue #902: KYLIN-3832 Query pushdown support postgresql

GitBox Tue, 29 Oct 2019 10:14:18 -0700

hit-lacus edited a comment on issue #902: KYLIN-3832 Query pushdown support 
postgresql
URL: https://github.com/apache/kylin/pull/902#issuecomment-547393013
 
 
   # Test Build
   
   ### Prepare data
   
![image](https://user-images.githubusercontent.com/14030549/67766878-6c3eda00-fa8a-11e9-88dc-35c940226f16.png)
   
   ### Set project level config
   <img width="1128" alt="image" 
src="https://user-images.githubusercontent.com/14030549/67767483-a197f780-fa8b-11e9-90c4-1040df59e2b5.png";>
   
   ### Import table from pg datasource
   <img width="596" alt="image" 
src="https://user-images.githubusercontent.com/14030549/67767578-ce4c0f00-fa8b-11e9-84e2-f1e07ac1a348.png";>
   
   
   ### Create Model and Cube
   <img width="1062" alt="image" 
src="https://user-images.githubusercontent.com/14030549/67789595-3102d200-faaf-11e9-84f0-a666dadbd2ca.png";>
   
   ### Build Job Success
   <img width="1395" alt="image" 
src="https://user-images.githubusercontent.com/14030549/67789653-48da5600-faaf-11e9-8ec8-154d43114881.png";>
   
   ### Log of sqoop step
   
   ```
   exe cmd:/opt/cloudera/parcels/CDH-5.7.6-1.cdh5.7.6.p0.6/bin/sqoop import 
-Dorg.apache.sqoop.splitter.allow_text_splitter=true  
-Dmapreduce.job.queuename=default --connect 
"jdbc:postgresql://10.1.2.42:54321/test_db5" --driver "org.postgresql.Driver" 
--username "postgres" --password "password" --query "SELECT 
\"film_play\".\"audience_id\" AS \"FILM_PLAY_AUDIENCE_ID\", 
\"film_play\".\"film_id\" AS \"FILM_PLAY_FILM_ID\", 
\"film_play\".\"watch_time\", \"film_play\".\"payment\" AS \"FILM_PLAY_PAYMENT\"
   FROM \"sc1\".\"film_play\" AS \"film_play\"
       LEFT JOIN \"public\".\"film\" AS \"film\" ON \"film_play\".\"film_id\" = 
\"film\".\"film_id\"
       LEFT JOIN \"sc2\".\"audience\" AS \"audience\" ON 
\"film_play\".\"audience_id\" = \"audience\".\"audience_id\"
   WHERE 1 = 1 AND (\"film_play\".\"watch_time\" >= '2017-04-01 00:00:00' AND 
\"film_play\".\"watch_time\" < '2017-08-01 00:00:00') AND \$CONDITIONS" 
--target-dir 
"hdfs://cdh-master:8020/kylin/postgresql_metadata/kylin-c517a543-b212-4a6f-f220-5eee731dea56/kylin_intermediate_playstatcube_a702211d_5192_902e_b4bd_fef7f7985ebe"
 --split-by "FILM_PLAY.WATCH_TIME" --boundary-query "SELECT 
MIN(\"film_play\".\"watch_time\"), MAX(\"film_play\".\"watch_time\")
   FROM \"sc1\".\"film_play\" AS \"film_play\"" --null-string '' 
--fields-terminated-by '|' --num-mappers 4
   Warning: 
/opt/cloudera/parcels/CDH-5.7.6-1.cdh5.7.6.p0.6/bin/../lib/sqoop/../accumulo 
does not exist! Accumulo imports will fail.
   Please set $ACCUMULO_HOME to the root of your Accumulo installation.
   19/10/30 00:44:47 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.7.6
   19/10/30 00:44:47 WARN tool.BaseSqoopTool: Setting your password on the 
command-line is insecure. Consider using -P instead.
   19/10/30 00:44:47 WARN sqoop.ConnFactory: Parameter --driver is set to an 
explicit driver however appropriate connection manager is not being set (via 
--connection-manager). Sqoop is going to fall back to 
org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which 
connection manager should be used next time.
   19/10/30 00:44:47 INFO manager.SqlManager: Using default fetchSize of 1000
   19/10/30 00:44:47 INFO tool.CodeGenTool: Beginning code generation
   19/10/30 00:44:48 INFO manager.SqlManager: Executing SQL statement: SELECT 
"film_play"."audience_id" AS "FILM_PLAY_AUDIENCE_ID", "film_play"."film_id" AS 
"FILM_PLAY_FILM_ID", "film_play"."watch_time", "film_play"."payment" AS 
"FILM_PLAY_PAYMENT"
   FROM "sc1"."film_play" AS "film_play"
       LEFT JOIN "public"."film" AS "film" ON "film_play"."film_id" = 
"film"."film_id"
       LEFT JOIN "sc2"."audience" AS "audience" ON "film_play"."audience_id" = 
"audience"."audience_id"
   WHERE 1 = 1 AND ("film_play"."watch_time" >= '2017-04-01 00:00:00' AND 
"film_play"."watch_time" < '2017-08-01 00:00:00') AND  (1 = 0) 
   19/10/30 00:44:48 INFO manager.SqlManager: Executing SQL statement: SELECT 
"film_play"."audience_id" AS "FILM_PLAY_AUDIENCE_ID", "film_play"."film_id" AS 
"FILM_PLAY_FILM_ID", "film_play"."watch_time", "film_play"."payment" AS 
"FILM_PLAY_PAYMENT"
   FROM "sc1"."film_play" AS "film_play"
       LEFT JOIN "public"."film" AS "film" ON "film_play"."film_id" = 
"film"."film_id"
       LEFT JOIN "sc2"."audience" AS "audience" ON "film_play"."audience_id" = 
"audience"."audience_id"
   WHERE 1 = 1 AND ("film_play"."watch_time" >= '2017-04-01 00:00:00' AND 
"film_play"."watch_time" < '2017-08-01 00:00:00') AND  (1 = 0) 
   19/10/30 00:44:48 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is 
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce
   注: 
/tmp/sqoop-root/compile/d59f0d854a853a628b5823a7a5192262/QueryResult.java使用或覆盖了已过时的
 API。
   注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
   19/10/30 00:44:49 INFO orm.CompilationManager: Writing jar file: 
/tmp/sqoop-root/compile/d59f0d854a853a628b5823a7a5192262/QueryResult.jar
   19/10/30 00:44:49 INFO mapreduce.ImportJobBase: Beginning query import.
   19/10/30 00:44:49 INFO Configuration.deprecation: mapred.jar is deprecated. 
Instead, use mapreduce.job.jar
   19/10/30 00:44:50 INFO Configuration.deprecation: mapred.map.tasks is 
deprecated. Instead, use mapreduce.job.maps
   19/10/30 00:44:50 INFO client.RMProxy: Connecting to ResourceManager at 
cdh-master/10.1.3.90:8032
   19/10/30 00:44:53 INFO db.DBInputFormat: Using read commited transaction 
isolation
   19/10/30 00:44:53 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT 
MIN("film_play"."watch_time"), MAX("film_play"."watch_time")
   FROM "sc1"."film_play" AS "film_play"
   19/10/30 00:44:53 INFO db.IntegerSplitter: Split size: 3909600000; Num 
splits: 4 from: 1484474703000 to: 1500113103000
   19/10/30 00:44:53 INFO mapreduce.JobSubmitter: number of splits:4
   19/10/30 00:44:53 INFO mapreduce.JobSubmitter: Submitting tokens for job: 
job_1571054284123_0700
   19/10/30 00:44:54 INFO impl.YarnClientImpl: Submitted application 
application_1571054284123_0700
   19/10/30 00:44:54 INFO mapreduce.Job: The url to track the job: 
http://cdh-master:8088/proxy/application_1571054284123_0700/
   19/10/30 00:44:54 INFO mapreduce.Job: Running job: job_1571054284123_0700
   19/10/30 00:45:00 INFO mapreduce.Job: Job job_1571054284123_0700 running in 
uber mode : false
   19/10/30 00:45:00 INFO mapreduce.Job:  map 0% reduce 0%
   19/10/30 00:45:05 INFO mapreduce.Job:  map 100% reduce 0%
   19/10/30 00:45:05 INFO mapreduce.Job: Job job_1571054284123_0700 completed 
successfully
   19/10/30 00:45:05 INFO mapreduce.Job: Counters: 30
        File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=580620
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=697
                HDFS: Number of bytes written=812
                HDFS: Number of read operations=16
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=8
        Job Counters 
                Launched map tasks=4
                Other local map tasks=4
                Total time spent by all maps in occupied slots (ms)=39411
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=13137
                Total vcore-seconds taken by all map tasks=13137
                Total megabyte-seconds taken by all map tasks=40356864
        Map-Reduce Framework
                Map input records=28
                Map output records=28
                Input split bytes=697
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=250
                CPU time spent (ms)=3830
                Physical memory (bytes) snapshot=1270038528
                Virtual memory (bytes) snapshot=18745585664
                Total committed heap usage (bytes)=3234332672
        File Input Format Counters 
                Bytes Read=0
        File Output Format Counters 
                Bytes Written=812
   19/10/30 00:45:05 INFO mapreduce.ImportJobBase: Transferred 812 bytes in 
14.9212 seconds (54.4191 bytes/sec)
   19/10/30 00:45:05 INFO mapreduce.ImportJobBase: Retrieved 28 records.
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [kylin] hit-lacus edited a comment on issue #902: KYLIN-3832 Query pushdown support postgresql

Reply via email to