rakeshramakrishnan opened a new issue #2439: URL: https://github.com/apache/hudi/issues/2439
- Have you gone through our [FAQs](https://cwiki.apache.org/confluence/display/HUDI/FAQ)? Yes **Describe the problem you faced** Unable to sync to external hive metastore via thrift protocol. Instead the sync seems to happen with the local hive store. **To Reproduce** Run pyspark file as below which does the following - connects to hive metastore using `hive.metastore.uris` using the thrift protocol and prints the existing tables: to show that the existing setup is able to connect to the metastore without any issues - generates a sample df using the generator from hudi, writes the df to a hudi table with hive sync enabled - reconnects to the hive metastore and prints the tables. Can observe that the newly synced table _does not show up_ - On opening a new pyspark shell, can see that the required table shows up in the local spark warehouse dir: `spark.catalog.listTables()` - The below log shows `HiveMetastoreConnection version 1.2.1 using Spark classes`. Have tried connecting to the hive metastore using spark `3.0.1` and hive `2.3.7` jars and able to list the tables in the external metastore. However, unable to use it with hudi `0.6.0`, and hence used spark `2.4.7` for the below example. ``` from pyspark.sql import SparkSession from pyspark.sql.functions import lit metastore_uri = "thrift://localhost:9083" spark = SparkSession.builder \ .appName("test-hudi-hive-sync") \ .enableHiveSupport() \ .config("hive.metastore.uris", metastore_uri) \ .getOrCreate() print("Before {}".format(spark.catalog.listTables())) tableName = "hive_hudi_sync" basePath = "file:///tmp/hive_hudi_sync" sc = spark.sparkContext dataGen = sc._jvm.org.apache.hudi.QuickstartUtils.DataGenerator() inserts = sc._jvm.org.apache.hudi.QuickstartUtils.convertToStringList(dataGen.generateInserts(10)) df = spark.read.json(spark.sparkContext.parallelize(inserts, 2))\ .withColumn("partitionpath", lit("partitionval")) df.show() hudi_options = { 'hoodie.table.name': tableName, 'hoodie.datasource.write.recordkey.field': 'uuid', 'hoodie.datasource.write.partitionpath.field': 'partitionpath', 'hoodie.datasource.write.table.name': tableName, 'hoodie.datasource.write.operation': 'insert', 'hoodie.datasource.write.precombine.field': 'ts', 'hoodie.upsert.shuffle.parallelism': 2, 'hoodie.insert.shuffle.parallelism': 2, 'hoodie.datasource.hive_sync.enable': True, 'hoodie.datasource.hive_sync.use_jdbc': False, 'hoodie.datasource.hive_sync.jdbcurl': metastore_uri, 'hoodie.datasource.hive_sync.partition_fields': 'partitionpath', 'hoodie.datasource.hive_sync.table': tableName, 'hoodie.datasource.hive_sync.partition_extractor_class': 'org.apache.hudi.hive.MultiPartKeysValueExtractor' } df\ .write.format("hudi"). \ options(**hudi_options). \ mode("overwrite"). \ save(basePath) print("After {}".format(spark.catalog.listTables())) ``` **Expected behavior** - Expecting the table `hive_hudi_sync` to show up in the external hive metastore after hive sync - The hive sync succeeds according to logs, but not able to see the new table in the metastore. - Instead only seeing the existing tables in the hive metastore. **Environment Description** * Hudi version : 0.6 * Spark version : 2.4.7 * Hive version : metastore uses Hive 3.1.0.3.1.0.0-78 * Storage (HDFS/S3/GCS..) : S3, but same for local too * Running on Docker? (yes/no) : No **Additional context** Have attached the run logs: - Can see that the native spark connect to hive metastore works. Am able to see the tables from the external hive metastore. - However, in the Hudi-hive sync run, can observe that it is not making a connection to the external hive metastore, but is using the local spark warehouse dir - Have removed the logs from `org.apache.spark` because they were adding to noise. If I need to attach it, do let me know. ``` .venv ❯ bin/spark-submit --master local[2] --deploy-mode client --packages org.apache.hudi:hudi-spark-bundle_2.11:0.6.0,org.apache.spark:spark-avro_2.11:2.4.4 --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' hive-metastore-pyspark.py Ivy Default Cache set to: /Users/rakeshramakrishnan/.ivy2/cache The jars for the packages stored in: /Users/rakeshramakrishnan/.ivy2/jars :: loading settings :: url = jar:file:/Users/rakeshramakrishnan/OSS/spark/spark-2.4.7-bin-hadoop2.7/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml org.apache.hudi#hudi-spark-bundle_2.11 added as a dependency org.apache.spark#spark-avro_2.11 added as a dependency :: resolving dependencies :: org.apache.spark#spark-submit-parent-1ea4440b-ae8a-49c2-b638-7765bc189b84;1.0 confs: [default] found org.apache.hudi#hudi-spark-bundle_2.11;0.6.0 in central found org.apache.spark#spark-avro_2.11;2.4.4 in central found org.spark-project.spark#unused;1.0.0 in central :: resolution report :: resolve 300ms :: artifacts dl 6ms :: modules in use: org.apache.hudi#hudi-spark-bundle_2.11;0.6.0 from central in [default] org.apache.spark#spark-avro_2.11;2.4.4 from central in [default] org.spark-project.spark#unused;1.0.0 from central in [default] --------------------------------------------------------------------- | | modules || artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| --------------------------------------------------------------------- | default | 3 | 0 | 0 | 0 || 3 | 0 | --------------------------------------------------------------------- :: retrieving :: org.apache.spark#spark-submit-parent-1ea4440b-ae8a-49c2-b638-7765bc189b84 confs: [default] 0 artifacts copied, 3 already retrieved (0kB/6ms) 294 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 1245 [Thread-5] INFO org.apache.spark.SparkContext - Running Spark version 2.4.7 1268 [Thread-5] INFO org.apache.spark.SparkContext - Submitted application: test-hudi-hive-sync 1949 [Thread-5] INFO org.apache.spark.ui.SparkUI - Bound SparkUI to 0.0.0.0, and started at http://192.168.0.104:4040 1966 [Thread-5] INFO org.apache.spark.SparkContext - Added JAR file:///Users/rakeshramakrishnan/.ivy2/jars/org.apache.hudi_hudi-spark-bundle_2.11-0.6.0.jar at spark://192.168.0.104:62151/jars/org.apache.hudi_hudi-spark-bundle_2.11-0.6.0.jar with timestamp 1610556549207 1967 [Thread-5] INFO org.apache.spark.SparkContext - Added JAR file:///Users/rakeshramakrishnan/.ivy2/jars/org.apache.spark_spark-avro_2.11-2.4.4.jar at spark://192.168.0.104:62151/jars/org.apache.spark_spark-avro_2.11-2.4.4.jar with timestamp 1610556549208 1967 [Thread-5] INFO org.apache.spark.SparkContext - Added JAR file:///Users/rakeshramakrishnan/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar at spark://192.168.0.104:62151/jars/org.spark-project.spark_unused-1.0.0.jar with timestamp 1610556549208 1992 [Thread-5] INFO org.apache.spark.SparkContext - Added file file:///Users/rakeshramakrishnan/.ivy2/jars/org.apache.hudi_hudi-spark-bundle_2.11-0.6.0.jar at file:///Users/rakeshramakrishnan/.ivy2/jars/org.apache.hudi_hudi-spark-bundle_2.11-0.6.0.jar with timestamp 1610556549232 1994 [Thread-5] INFO org.apache.spark.util.Utils - Copying /Users/rakeshramakrishnan/.ivy2/jars/org.apache.hudi_hudi-spark-bundle_2.11-0.6.0.jar to /private/var/folders/v8/nx847jpd1452pyg64r15m_7w0000gn/T/spark-af0a1237-22bd-4a2e-a29c-2d8af9d40aae/userFiles-11f1df4b-2ad9-427f-8beb-2bbc0c8639c6/org.apache.hudi_hudi-spark-bundle_2.11-0.6.0.jar 2118 [Thread-5] INFO org.apache.spark.SparkContext - Added file file:///Users/rakeshramakrishnan/.ivy2/jars/org.apache.spark_spark-avro_2.11-2.4.4.jar at file:///Users/rakeshramakrishnan/.ivy2/jars/org.apache.spark_spark-avro_2.11-2.4.4.jar with timestamp 1610556549359 2118 [Thread-5] INFO org.apache.spark.util.Utils - Copying /Users/rakeshramakrishnan/.ivy2/jars/org.apache.spark_spark-avro_2.11-2.4.4.jar to /private/var/folders/v8/nx847jpd1452pyg64r15m_7w0000gn/T/spark-af0a1237-22bd-4a2e-a29c-2d8af9d40aae/userFiles-11f1df4b-2ad9-427f-8beb-2bbc0c8639c6/org.apache.spark_spark-avro_2.11-2.4.4.jar 2126 [Thread-5] INFO org.apache.spark.SparkContext - Added file file:///Users/rakeshramakrishnan/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar at file:///Users/rakeshramakrishnan/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar with timestamp 1610556549367 2126 [Thread-5] INFO org.apache.spark.util.Utils - Copying /Users/rakeshramakrishnan/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar to /private/var/folders/v8/nx847jpd1452pyg64r15m_7w0000gn/T/spark-af0a1237-22bd-4a2e-a29c-2d8af9d40aae/userFiles-11f1df4b-2ad9-427f-8beb-2bbc0c8639c6/org.spark-project.spark_unused-1.0.0.jar 2180 [Thread-5] INFO org.apache.spark.executor.Executor - Starting executor ID driver on host localhost 2237 [Thread-5] INFO org.apache.spark.util.Utils - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 62152. 2238 [Thread-5] INFO org.apache.spark.network.netty.NettyBlockTransferService - Server created on 192.168.0.104:62152 2584 [Thread-5] INFO org.apache.spark.sql.internal.SharedState - Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/Users/rakeshramakrishnan/OSS/spark/spark-2.4.7-bin-hadoop2.7/spark-warehouse'). 2585 [Thread-5] INFO org.apache.spark.sql.internal.SharedState - Warehouse path is 'file:/Users/rakeshramakrishnan/OSS/spark/spark-2.4.7-bin-hadoop2.7/spark-warehouse'. 3140 [Thread-5] INFO org.apache.spark.sql.hive.HiveUtils - Initializing HiveMetastoreConnection version 1.2.1 using Spark classes. 3642 [Thread-5] INFO hive.metastore - Trying to connect to metastore with URI thrift://localhost:9083 4753 [Thread-5] INFO hive.metastore - Connected to metastore. 5590 [Thread-5] INFO org.apache.hadoop.hive.ql.session.SessionState - Created local directory: /var/folders/v8/nx847jpd1452pyg64r15m_7w0000gn/T/46498653-2d37-4043-aa85-93083a524fc0_resources 5597 [Thread-5] INFO org.apache.hadoop.hive.ql.session.SessionState - Created HDFS directory: /tmp/hive/rakeshramakrishnan/46498653-2d37-4043-aa85-93083a524fc0 5605 [Thread-5] INFO org.apache.hadoop.hive.ql.session.SessionState - Created local directory: /var/folders/v8/nx847jpd1452pyg64r15m_7w0000gn/T/rakeshramakrishnan/46498653-2d37-4043-aa85-93083a524fc0 5615 [Thread-5] INFO org.apache.hadoop.hive.ql.session.SessionState - Created HDFS directory: /tmp/hive/rakeshramakrishnan/46498653-2d37-4043-aa85-93083a524fc0/_tmp_space.db 5618 [Thread-5] INFO org.apache.spark.sql.hive.client.HiveClientImpl - Warehouse location for Hive client (version 1.2.2) is file:/Users/rakeshramakrishnan/OSS/spark/spark-2.4.7-bin-hadoop2.7/spark-warehouse 15852 [Thread-5] INFO org.apache.spark.scheduler.DAGScheduler - Job 1 finished: hasNext at NativeMethodAccessorImpl.java:0, took 0.041704 s Before [Table(name='****', database='default', description=None, tableType='MANAGED', isTemporary=False), .... tables in hive metastore] 17183 [Thread-5] INFO org.apache.spark.scheduler.DAGScheduler - Job 4 finished: showString at NativeMethodAccessorImpl.java:0, took 0.035620 s +-------------------+-------------------+----------+-------------------+-------------------+------------------+-------------+---------+---+--------------------+ | begin_lat| begin_lon| driver| end_lat| end_lon| fare|partitionpath| rider| ts| uuid| +-------------------+-------------------+----------+-------------------+-------------------+------------------+-------------+---------+---+--------------------+ | 0.4726905879569653|0.46157858450465483|driver-213| 0.754803407008858| 0.9671159942018241|34.158284716382845| partitionval|rider-213|0.0|f0476ada-9d26-4a6...| | 0.6100070562136587| 0.8779402295427752|driver-213| 0.3407870505929602| 0.5030798142293655| 43.4923811219014| partitionval|rider-213|0.0|2507bfa1-01ec-471...| | 0.5731835407930634| 0.4923479652912024|driver-213|0.08988581780930216|0.42520899698713666| 64.27696295884016| partitionval|rider-213|0.0|f3951634-256a-46f...| |0.21624150367601136|0.14285051259466197|driver-213| 0.5890949624813784| 0.0966823831927115| 93.56018115236618| partitionval|rider-213|0.0|f0e3fdc7-685d-45d...| | 0.40613510977307| 0.5644092139040959|driver-213| 0.798706304941517|0.02698359227182834|17.851135255091155| partitionval|rider-213|0.0|92233d5f-f684-43e...| | 0.8742041526408587| 0.7528268153249502|driver-213| 0.9197827128888302| 0.362464770874404|19.179139106643607| partitionval|rider-213|0.0|f683850c-2940-4e0...| | 0.1856488085068272| 0.9694586417848392|driver-213|0.38186367037201974|0.25252652214479043| 33.92216483948643| partitionval|rider-213|0.0|47af2a09-264b-4bd...| | 0.0750588760043035|0.03844104444445928|driver-213|0.04376353354538354| 0.6346040067610669| 66.62084366450246| partitionval|rider-213|0.0|43223a73-70e6-4ec...| | 0.651058505660742| 0.8192868687714224|driver-213|0.20714896002914462|0.06224031095826987| 41.06290929046368| partitionval|rider-213|0.0|851b928f-f368-49c...| |0.11488393157088261| 0.6273212202489661|driver-213| 0.7454678537511295| 0.3954939864908973| 27.79478688582596| partitionval|rider-213|0.0|2683968f-4b48-477...| +-------------------+-------------------+----------+-------------------+-------------------+------------------+-------------+---------+---+--------------------+ 17293 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Initializing file:///tmp/hive_hudi_sync as hoodie table file:///tmp/hive_hudi_sync 17297 [Thread-5] INFO org.apache.hudi.common.fs.FSUtils - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f] 17323 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Loading HoodieTableMetaClient from file:///tmp/hive_hudi_sync 17325 [Thread-5] INFO org.apache.hudi.common.fs.FSUtils - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f] 17330 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableConfig - Loading table properties from file:/tmp/hive_hudi_sync/.hoodie/hoodie.properties 17336 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hive_hudi_sync 17336 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Finished initializing Table of type COPY_ON_WRITE from file:///tmp/hive_hudi_sync 17365 [Thread-5] INFO org.apache.hudi.HoodieSparkSqlWriter$ - Registered avro schema : { "type" : "record", "name" : "hive_hudi_sync_record", "namespace" : "hoodie.hive_hudi_sync", "fields" : [ { "name" : "begin_lat", "type" : [ "double", "null" ] }, { "name" : "begin_lon", "type" : [ "double", "null" ] }, { "name" : "driver", "type" : [ "string", "null" ] }, { "name" : "end_lat", "type" : [ "double", "null" ] }, { "name" : "end_lon", "type" : [ "double", "null" ] }, { "name" : "fare", "type" : [ "double", "null" ] }, { "name" : "partitionpath", "type" : "string" }, { "name" : "rider", "type" : [ "string", "null" ] }, { "name" : "ts", "type" : [ "double", "null" ] }, { "name" : "uuid", "type" : [ "string", "null" ] } ] } 17458 [Thread-5] INFO org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator - Code generated in 14.138735 ms 17521 [Thread-5] INFO org.apache.hudi.common.fs.FSUtils - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f] 17521 [Thread-5] INFO org.apache.hudi.client.AbstractHoodieClient - Starting Timeline service !! 17522 [Thread-5] INFO org.apache.hudi.client.embedded.EmbeddedTimelineService - Overriding hostIp to (192.168.0.104) found in spark-conf. It was null 17524 [Thread-5] INFO org.apache.hudi.common.table.view.FileSystemViewManager - Creating View Manager with storage type :MEMORY 17525 [Thread-5] INFO org.apache.hudi.common.table.view.FileSystemViewManager - Creating in-memory based Table View 17537 [Thread-5] INFO org.eclipse.jetty.util.log - Logging initialized @18965ms to org.eclipse.jetty.util.log.Slf4jLog 17646 [Thread-5] INFO io.javalin.Javalin - __ __ _ / /____ _ _ __ ____ _ / /(_)____ __ / // __ `/| | / // __ `// // // __ \ / /_/ // /_/ / | |/ // /_/ // // // / / / \____/ \__,_/ |___/ \__,_//_//_//_/ /_/ https://javalin.io/documentation 17647 [Thread-5] INFO io.javalin.Javalin - Starting Javalin ... 17768 [Thread-5] INFO io.javalin.Javalin - Listening on http://localhost:62161/ 17768 [Thread-5] INFO io.javalin.Javalin - Javalin started in 125ms \o/ 17768 [Thread-5] INFO org.apache.hudi.timeline.service.TimelineService - Starting Timeline server on port :62161 17768 [Thread-5] INFO org.apache.hudi.client.embedded.EmbeddedTimelineService - Started embedded timeline server at 192.168.0.104:62161 17782 [Thread-5] INFO org.apache.spark.SparkContext - Starting job: isEmpty at HoodieSparkSqlWriter.scala:164 17783 [dag-scheduler-event-loop] INFO org.apache.spark.scheduler.DAGScheduler - Got job 5 (isEmpty at HoodieSparkSqlWriter.scala:164) with 1 output partitions 17783 [dag-scheduler-event-loop] INFO org.apache.spark.scheduler.DAGScheduler - Final stage: ResultStage 5 (isEmpty at HoodieSparkSqlWriter.scala:164) 17783 [dag-scheduler-event-loop] INFO org.apache.spark.scheduler.DAGScheduler - Parents of final stage: List() 17784 [dag-scheduler-event-loop] INFO org.apache.spark.scheduler.DAGScheduler - Missing parents: List() 17784 [dag-scheduler-event-loop] INFO org.apache.spark.scheduler.DAGScheduler - Submitting ResultStage 5 (MapPartitionsRDD[24] at map at HoodieSparkSqlWriter.scala:139), which has no missing parents 17789 [dag-scheduler-event-loop] INFO org.apache.spark.storage.memory.MemoryStore - Block broadcast_5 stored as values in memory (estimated size 28.8 KB, free 366.2 MB) 17795 [dag-scheduler-event-loop] INFO org.apache.spark.storage.memory.MemoryStore - Block broadcast_5_piece0 stored as bytes in memory (estimated size 13.3 KB, free 366.2 MB) 17796 [dispatcher-event-loop-0] INFO org.apache.spark.storage.BlockManagerInfo - Added broadcast_5_piece0 in memory on 192.168.0.104:62152 (size: 13.3 KB, free: 366.3 MB) 17796 [dag-scheduler-event-loop] INFO org.apache.spark.SparkContext - Created broadcast 5 from broadcast at DAGScheduler.scala:1184 17797 [dag-scheduler-event-loop] INFO org.apache.spark.scheduler.DAGScheduler - Submitting 1 missing tasks from ResultStage 5 (MapPartitionsRDD[24] at map at HoodieSparkSqlWriter.scala:139) (first 15 tasks are for partitions Vector(0)) 17797 [dag-scheduler-event-loop] INFO org.apache.spark.scheduler.TaskSchedulerImpl - Adding task set 5.0 with 1 tasks 17802 [dispatcher-event-loop-1] INFO org.apache.spark.scheduler.TaskSetManager - Starting task 0.0 in stage 5.0 (TID 6, localhost, executor driver, partition 0, PROCESS_LOCAL, 9327 bytes) 17803 [Executor task launch worker for task 6] INFO org.apache.spark.executor.Executor - Running task 0.0 in stage 5.0 (TID 6) 17847 [Executor task launch worker for task 6] INFO org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator - Code generated in 14.283395 ms 17856 [Executor task launch worker for task 6] INFO org.apache.spark.executor.Executor - Finished task 0.0 in stage 5.0 (TID 6). 2049 bytes result sent to driver 17863 [task-result-getter-2] INFO org.apache.spark.scheduler.TaskSetManager - Finished task 0.0 in stage 5.0 (TID 6) in 65 ms on localhost (executor driver) (1/1) 17863 [task-result-getter-2] INFO org.apache.spark.scheduler.TaskSchedulerImpl - Removed TaskSet 5.0, whose tasks have all completed, from pool 17865 [dag-scheduler-event-loop] INFO org.apache.spark.scheduler.DAGScheduler - ResultStage 5 (isEmpty at HoodieSparkSqlWriter.scala:164) finished in 0.079 s 17865 [Thread-5] INFO org.apache.spark.scheduler.DAGScheduler - Job 5 finished: isEmpty at HoodieSparkSqlWriter.scala:164, took 0.083123 s 17871 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Loading HoodieTableMetaClient from file:///tmp/hive_hudi_sync 17872 [Thread-5] INFO org.apache.hudi.common.fs.FSUtils - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f] 17873 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableConfig - Loading table properties from file:/tmp/hive_hudi_sync/.hoodie/hoodie.properties 17874 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hive_hudi_sync 17874 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Loading Active commit timeline for file:///tmp/hive_hudi_sync 17885 [Thread-5] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants [] 17886 [Thread-5] INFO org.apache.hudi.common.table.view.FileSystemViewManager - Creating View Manager with storage type :REMOTE_FIRST 17886 [Thread-5] INFO org.apache.hudi.common.table.view.FileSystemViewManager - Creating remote first table view 17890 [Thread-5] INFO org.apache.hudi.client.HoodieWriteClient - Generate a new instant time 20210113221924 17890 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Loading HoodieTableMetaClient from file:///tmp/hive_hudi_sync 17891 [Thread-5] INFO org.apache.hudi.common.fs.FSUtils - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f] 17892 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableConfig - Loading table properties from file:/tmp/hive_hudi_sync/.hoodie/hoodie.properties 17892 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hive_hudi_sync 17892 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Loading Active commit timeline for file:///tmp/hive_hudi_sync 17894 [Thread-5] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants [] 17897 [Thread-5] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Creating a new instant [==>20210113221924__commit__REQUESTED] 17919 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Loading HoodieTableMetaClient from file:///tmp/hive_hudi_sync 17921 [Thread-5] INFO org.apache.hudi.common.fs.FSUtils - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f] 17922 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableConfig - Loading table properties from file:/tmp/hive_hudi_sync/.hoodie/hoodie.properties 17923 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hive_hudi_sync 17923 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Loading Active commit timeline for file:///tmp/hive_hudi_sync 17926 [Thread-5] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants [[==>20210113221924__commit__REQUESTED]] 17930 [Thread-5] INFO org.apache.hudi.common.table.view.FileSystemViewManager - Creating View Manager with storage type :REMOTE_FIRST 17930 [Thread-5] INFO org.apache.hudi.common.table.view.FileSystemViewManager - Creating remote first table view 17933 [Thread-5] INFO org.apache.hudi.client.AsyncCleanerService - Auto cleaning is not enabled. Not running cleaner now 17984 [Thread-5] INFO org.apache.spark.SparkContext - Starting job: countByKey at WorkloadProfile.java:73 18237 [Thread-5] INFO org.apache.hudi.table.action.commit.BaseCommitActionExecutor - Workload profile :WorkloadProfile {globalStat=WorkloadStat {numInserts=10, numUpdates=0}, partitionStat={partitionval=WorkloadStat {numInserts=10, numUpdates=0}}} 18278 [Thread-5] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Checking for file exists ?file:/tmp/hive_hudi_sync/.hoodie/20210113221924.commit.requested 18291 [Thread-5] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Create new file for toInstant ?file:/tmp/hive_hudi_sync/.hoodie/20210113221924.inflight 18293 [Thread-5] INFO org.apache.hudi.table.action.commit.UpsertPartitioner - AvgRecordSize => 1024 18432 [Thread-5] INFO org.apache.spark.SparkContext - Starting job: collectAsMap at UpsertPartitioner.java:216 18523 [Thread-5] INFO org.apache.spark.scheduler.DAGScheduler - Job 7 finished: collectAsMap at UpsertPartitioner.java:216, took 0.089976 s 18525 [Thread-5] INFO org.apache.hudi.table.action.commit.UpsertPartitioner - For partitionPath : partitionval Small Files => [] 18525 [Thread-5] INFO org.apache.hudi.table.action.commit.UpsertPartitioner - After small file assignment: unassignedInserts => 10, totalInsertBuckets => 1, recordsPerBucket => 122880 18526 [Thread-5] INFO org.apache.hudi.table.action.commit.UpsertPartitioner - Total insert buckets for partition path partitionval => [InsertBucket {bucketNumber=0, weight=1.0}] 18526 [Thread-5] INFO org.apache.hudi.table.action.commit.UpsertPartitioner - Total Buckets :1, buckets info => {0=BucketInfo {bucketType=INSERT, fileIdPrefix=114dfaba-3a25-4278-9e7b-f2784642f76c, partitionPath=partitionval}}, Partition to insert buckets => {partitionval=[InsertBucket {bucketNumber=0, weight=1.0}]}, UpdateLocations mapped to buckets =>{} 18585 [Thread-5] INFO org.apache.hudi.table.action.commit.BaseCommitActionExecutor - Auto commit disabled for 20210113221924 18796 [pool-18-thread-1] INFO org.apache.hudi.common.util.queue.IteratorBasedQueueProducer - starting to buffer records 18797 [pool-18-thread-2] INFO org.apache.hudi.common.util.queue.BoundedInMemoryExecutor - starting consumer thread 18806 [pool-18-thread-1] INFO org.apache.hudi.common.util.queue.IteratorBasedQueueProducer - finished buffering records 18811 [pool-18-thread-2] INFO org.apache.hudi.common.fs.FSUtils - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: ], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f] 18838 [pool-18-thread-2] INFO org.apache.hudi.table.MarkerFiles - Creating Marker Path=file:/tmp/hive_hudi_sync/.hoodie/.temp/20210113221924/partitionval/114dfaba-3a25-4278-9e7b-f2784642f76c-0_0-10-14_20210113221924.parquet.marker.CREATE 18897 [pool-18-thread-2] INFO org.apache.hudi.common.fs.FSUtils - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: ], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f] 18900 [pool-18-thread-2] INFO org.apache.hudi.common.fs.FSUtils - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: ], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f] 18901 [pool-18-thread-2] INFO org.apache.hudi.common.fs.FSUtils - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: ], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f] 19008 [pool-18-thread-2] INFO org.apache.hadoop.io.compress.CodecPool - Got brand-new compressor [.gz] 19434 [pool-18-thread-2] INFO org.apache.hudi.common.fs.FSUtils - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: ], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f] 19434 [pool-18-thread-2] INFO org.apache.hudi.common.fs.FSUtils - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: ], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f] 19434 [pool-18-thread-2] INFO org.apache.hudi.io.HoodieCreateHandle - New CreateHandle for partition :partitionval with fileId 114dfaba-3a25-4278-9e7b-f2784642f76c-0 19445 [pool-18-thread-2] INFO org.apache.hudi.io.HoodieCreateHandle - Closing the file 114dfaba-3a25-4278-9e7b-f2784642f76c-0 as we are done with all the records 10 19445 [pool-18-thread-2] INFO org.apache.parquet.hadoop.InternalParquetRecordWriter - Flushing mem columnStore to file. allocated memory: 2179 19559 [pool-18-thread-2] INFO org.apache.hudi.io.HoodieCreateHandle - CreateHandle for partitionPath partitionval fileID 114dfaba-3a25-4278-9e7b-f2784642f76c-0, took 747 ms. 19559 [pool-18-thread-2] INFO org.apache.hudi.common.util.queue.BoundedInMemoryExecutor - Queue Consumption is done; notifying producer threads 19573 [Thread-5] INFO org.apache.spark.scheduler.DAGScheduler - Job 8 finished: count at HoodieSparkSqlWriter.scala:389, took 0.980784 s 19574 [Thread-5] INFO org.apache.hudi.HoodieSparkSqlWriter$ - No errors. Proceeding to commit the write. 19654 [Thread-5] INFO org.apache.spark.SparkContext - Starting job: collect at AbstractHoodieWriteClient.java:98 19729 [Thread-5] INFO org.apache.hudi.client.AbstractHoodieWriteClient - Committing 20210113221924 19729 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Loading HoodieTableMetaClient from file:///tmp/hive_hudi_sync 19731 [Thread-5] INFO org.apache.hudi.common.fs.FSUtils - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f] 19731 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableConfig - Loading table properties from file:/tmp/hive_hudi_sync/.hoodie/hoodie.properties 19732 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hive_hudi_sync 19732 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Loading HoodieTableMetaClient from file:///tmp/hive_hudi_sync 19733 [Thread-5] INFO org.apache.hudi.common.fs.FSUtils - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f] 19741 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableConfig - Loading table properties from file:/tmp/hive_hudi_sync/.hoodie/hoodie.properties 19742 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hive_hudi_sync 19742 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Loading Active commit timeline for file:///tmp/hive_hudi_sync 19743 [Thread-5] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants [[==>20210113221924__commit__INFLIGHT]] 19744 [Thread-5] INFO org.apache.hudi.common.table.view.FileSystemViewManager - Creating View Manager with storage type :REMOTE_FIRST 19744 [Thread-5] INFO org.apache.hudi.common.table.view.FileSystemViewManager - Creating remote first table view 19864 [Thread-5] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Marking instant complete [==>20210113221924__commit__INFLIGHT] 19864 [Thread-5] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Checking for file exists ?file:/tmp/hive_hudi_sync/.hoodie/20210113221924.inflight 19887 [Thread-5] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Create new file for toInstant ?file:/tmp/hive_hudi_sync/.hoodie/20210113221924.commit 19887 [Thread-5] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Completed [==>20210113221924__commit__INFLIGHT] 19945 [Thread-5] INFO org.apache.spark.SparkContext - Starting job: foreach at MarkerFiles.java:97 20000 [Thread-5] INFO org.apache.hudi.table.MarkerFiles - Removing marker directory at file:/tmp/hive_hudi_sync/.hoodie/.temp/20210113221924 20005 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Loading HoodieTableMetaClient from file:///tmp/hive_hudi_sync 20006 [Thread-5] INFO org.apache.hudi.common.fs.FSUtils - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f] 20007 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableConfig - Loading table properties from file:/tmp/hive_hudi_sync/.hoodie/hoodie.properties 20008 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hive_hudi_sync 20008 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Loading Active commit timeline for file:///tmp/hive_hudi_sync 20010 [Thread-5] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants [[20210113221924__commit__COMPLETED]] 20011 [Thread-5] INFO org.apache.hudi.common.table.view.FileSystemViewManager - Creating View Manager with storage type :REMOTE_FIRST 20011 [Thread-5] INFO org.apache.hudi.common.table.view.FileSystemViewManager - Creating remote first table view 20019 [Thread-5] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants [[==>20210113221924__commit__REQUESTED], [==>20210113221924__commit__INFLIGHT], [20210113221924__commit__COMPLETED]] 20020 [Thread-5] INFO org.apache.hudi.table.HoodieTimelineArchiveLog - No Instants to archive 20021 [Thread-5] INFO org.apache.hudi.client.HoodieWriteClient - Auto cleaning is enabled. Running cleaner now 20021 [Thread-5] INFO org.apache.hudi.client.HoodieWriteClient - Cleaner started 20021 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Loading HoodieTableMetaClient from file:///tmp/hive_hudi_sync 20022 [Thread-5] INFO org.apache.hudi.common.fs.FSUtils - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f] 20022 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableConfig - Loading table properties from file:/tmp/hive_hudi_sync/.hoodie/hoodie.properties 20023 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hive_hudi_sync 20023 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Loading Active commit timeline for file:///tmp/hive_hudi_sync 20025 [Thread-5] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants [[20210113221924__commit__COMPLETED]] 20025 [Thread-5] INFO org.apache.hudi.common.table.view.FileSystemViewManager - Creating View Manager with storage type :REMOTE_FIRST 20025 [Thread-5] INFO org.apache.hudi.common.table.view.FileSystemViewManager - Creating remote first table view 20032 [Thread-5] INFO org.apache.hudi.common.table.view.FileSystemViewManager - Creating remote view for basePath file:///tmp/hive_hudi_sync. Server=192.168.0.104:62161 20033 [Thread-5] INFO org.apache.hudi.common.table.view.FileSystemViewManager - Creating InMemory based view for basePath file:///tmp/hive_hudi_sync 20066 [Thread-5] INFO org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView - Sending request : (http://192.168.0.104:62161/v1/hoodie/view/compactions/pending/?basepath=file%3A%2F%2F%2Ftmp%2Fhive_hudi_sync&lastinstantts=20210113221924&timelinehash=40aa81825cab43b9fe13e7d01121c08f8868e61fb6d6794c1fe9d0d7f43e449e) 20362 [qtp1464312295-98] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Loading HoodieTableMetaClient from file:///tmp/hive_hudi_sync 20363 [qtp1464312295-98] INFO org.apache.hudi.common.fs.FSUtils - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f] 20364 [qtp1464312295-98] INFO org.apache.hudi.common.table.HoodieTableConfig - Loading table properties from file:/tmp/hive_hudi_sync/.hoodie/hoodie.properties 20364 [qtp1464312295-98] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hive_hudi_sync 20364 [qtp1464312295-98] INFO org.apache.hudi.common.table.view.FileSystemViewManager - Creating InMemory based view for basePath file:///tmp/hive_hudi_sync 20366 [qtp1464312295-98] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants [[20210113221924__commit__COMPLETED]] 20374 [qtp1464312295-98] INFO org.apache.hudi.timeline.service.FileSystemViewHandler - TimeTakenMillis[Total=13, Refresh=11, handle=2, Check=0], Success=true, Query=basepath=file%3A%2F%2F%2Ftmp%2Fhive_hudi_sync&lastinstantts=20210113221924&timelinehash=40aa81825cab43b9fe13e7d01121c08f8868e61fb6d6794c1fe9d0d7f43e449e, Host=192.168.0.104:62161, synced=false 20404 [Thread-5] INFO org.apache.hudi.table.action.clean.CleanPlanner - No earliest commit to retain. No need to scan partitions !! 20404 [Thread-5] INFO org.apache.hudi.table.action.clean.CleanActionExecutor - Nothing to clean here. It is already clean 20418 [Thread-5] INFO org.apache.hudi.client.AbstractHoodieWriteClient - Committed 20210113221924 20418 [Thread-5] INFO org.apache.hudi.HoodieSparkSqlWriter$ - Commit 20210113221924 successful! 20418 [Thread-5] INFO org.apache.hudi.HoodieSparkSqlWriter$ - Config.isInlineCompaction ? false 20419 [Thread-5] INFO org.apache.hudi.HoodieSparkSqlWriter$ - Compaction Scheduled is Option{val=null} 20420 [Thread-5] INFO org.apache.hudi.HoodieSparkSqlWriter$ - Syncing to Hive Metastore (URL: thrift://localhost:9083) 20547 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Loading HoodieTableMetaClient from file:/tmp/hive_hudi_sync 20547 [Thread-5] INFO org.apache.hudi.common.fs.FSUtils - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f] 20548 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableConfig - Loading table properties from file:/tmp/hive_hudi_sync/.hoodie/hoodie.properties 20548 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:/tmp/hive_hudi_sync 20548 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Loading Active commit timeline for file:/tmp/hive_hudi_sync 20550 [Thread-5] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants [[20210113221924__commit__COMPLETED]] 20681 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore 20712 [Thread-5] INFO org.apache.hadoop.hive.metastore.ObjectStore - ObjectStore, initialize called 20850 [Thread-5] INFO DataNucleus.Persistence - Property hive.metastore.integral.jdo.pushdown unknown - will be ignored 20850 [Thread-5] INFO DataNucleus.Persistence - Property datanucleus.cache.level2 unknown - will be ignored 21860 [Thread-5] INFO org.apache.hadoop.hive.metastore.ObjectStore - Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order" 22725 [Thread-5] INFO DataNucleus.Datastore - The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table. 22726 [Thread-5] INFO DataNucleus.Datastore - The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table. 22901 [Thread-5] INFO DataNucleus.Datastore - The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table. 22901 [Thread-5] INFO DataNucleus.Datastore - The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table. 22975 [Thread-5] INFO DataNucleus.Query - Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing 22978 [Thread-5] INFO org.apache.hadoop.hive.metastore.MetaStoreDirectSql - Using direct SQL, underlying DB is DERBY 22981 [Thread-5] INFO org.apache.hadoop.hive.metastore.ObjectStore - Initialized ObjectStore 23194 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - Added admin role in metastore 23196 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - Added public role in metastore 23238 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - No user is added in admin role, since config is empty 23333 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: get_all_databases 23334 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=rakeshramakrishnan ip=unknown-ip-addr cmd=get_all_databases 23355 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: get_functions: db=default pat=* 23355 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=rakeshramakrishnan ip=unknown-ip-addr cmd=get_functions: db=default pat=* 23357 [Thread-5] INFO DataNucleus.Datastore - The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table. 23408 [Thread-5] INFO org.apache.hudi.hive.HiveSyncTool - Trying to sync hoodie table hive_hudi_sync with base path file:/tmp/hive_hudi_sync of type COPY_ON_WRITE 23408 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: get_table : db=default tbl=hive_hudi_sync 23408 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=rakeshramakrishnan ip=unknown-ip-addr cmd=get_table : db=default tbl=hive_hudi_sync 23478 [Thread-5] INFO org.apache.hadoop.hive.ql.session.SessionState - Created local directory: /var/folders/v8/nx847jpd1452pyg64r15m_7w0000gn/T/2c4d7adf-7f48-47a8-b8d0-2ee0330fe201_resources 23485 [Thread-5] INFO org.apache.hadoop.hive.ql.session.SessionState - Created HDFS directory: /tmp/hive/rakeshramakrishnan/2c4d7adf-7f48-47a8-b8d0-2ee0330fe201 23492 [Thread-5] INFO org.apache.hadoop.hive.ql.session.SessionState - Created local directory: /var/folders/v8/nx847jpd1452pyg64r15m_7w0000gn/T/rakeshramakrishnan/2c4d7adf-7f48-47a8-b8d0-2ee0330fe201 23500 [Thread-5] INFO org.apache.hadoop.hive.ql.session.SessionState - Created HDFS directory: /tmp/hive/rakeshramakrishnan/2c4d7adf-7f48-47a8-b8d0-2ee0330fe201/_tmp_space.db 23513 [Thread-5] INFO org.apache.hudi.hive.HoodieHiveClient - Time taken to start SessionState and create Driver: 85 ms 23517 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver> 23517 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver> 23517 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver> 23556 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver> 23559 [Thread-5] INFO hive.ql.parse.ParseDriver - Parsing command: create database if not exists default 24543 [Thread-5] INFO hive.ql.parse.ParseDriver - Parse Completed 24545 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=parse start=1610556570797 end=1610556571786 duration=989 from=org.apache.hadoop.hive.ql.Driver> 24548 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver> 24609 [Thread-5] INFO org.apache.hadoop.hive.ql.Driver - Semantic Analysis Completed 24609 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=semanticAnalyze start=1610556571789 end=1610556571850 duration=61 from=org.apache.hadoop.hive.ql.Driver> 24619 [Thread-5] INFO org.apache.hadoop.hive.ql.Driver - Returning Hive schema: Schema(fieldSchemas:null, properties:null) 24619 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=compile start=1610556570758 end=1610556571860 duration=1102 from=org.apache.hadoop.hive.ql.Driver> 24619 [Thread-5] INFO org.apache.hadoop.hive.ql.Driver - Concurrency mode is disabled, not creating a lock manager 24619 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=Driver.execute from=org.apache.hadoop.hive.ql.Driver> 24619 [Thread-5] INFO org.apache.hadoop.hive.ql.Driver - Starting command(queryId=rakeshramakrishnan_20210113221930_0ed9aaa9-b2ee-4824-a8f4-178fda3cdd72): create database if not exists default 24653 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=TimeToSubmit start=1610556570758 end=1610556571894 duration=1136 from=org.apache.hadoop.hive.ql.Driver> 24653 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver> 24653 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=task.DDL.Stage-0 from=org.apache.hadoop.hive.ql.Driver> 24658 [Thread-5] INFO org.apache.hadoop.hive.ql.Driver - Starting task [Stage-0:DDL] in serial mode 24665 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: create_database: Database(name:default, description:null, locationUri:null, parameters:null, ownerName:rakeshramakrishnan, ownerType:USER) 24665 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=rakeshramakrishnan ip=unknown-ip-addr cmd=create_database: Database(name:default, description:null, locationUri:null, parameters:null, ownerName:rakeshramakrishnan, ownerType:USER) 24671 [Thread-5] ERROR org.apache.hadoop.hive.metastore.RetryingHMSHandler - AlreadyExistsException(message:Database default already exists) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_database(HiveMetaStore.java:891) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107) at com.sun.proxy.$Proxy35.create_database(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createDatabase(HiveMetaStoreClient.java:644) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156) at com.sun.proxy.$Proxy36.createDatabase(Unknown Source) at org.apache.hadoop.hive.ql.metadata.Hive.createDatabase(Hive.java:306) at org.apache.hadoop.hive.ql.exec.DDLTask.createDatabase(DDLTask.java:3895) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:271) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1653) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1412) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLs(HoodieHiveClient.java:384) at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLUsingHiveDriver(HoodieHiveClient.java:367) at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:357) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:121) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:94) at org.apache.hudi.HoodieSparkSqlWriter$.org$apache$hudi$HoodieSparkSqlWriter$$syncHive(HoodieSparkSqlWriter.scala:321) at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$metaSync$2.apply(HoodieSparkSqlWriter.scala:363) at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$metaSync$2.apply(HoodieSparkSqlWriter.scala:359) at scala.collection.mutable.HashSet.foreach(HashSet.scala:78) at org.apache.hudi.HoodieSparkSqlWriter$.metaSync(HoodieSparkSqlWriter.scala:359) at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:417) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:205) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:125) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:696) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:305) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:291) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:249) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:745) 24671 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=runTasks start=1610556571894 end=1610556571912 duration=18 from=org.apache.hadoop.hive.ql.Driver> 24671 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=Driver.execute start=1610556571860 end=1610556571912 duration=52 from=org.apache.hadoop.hive.ql.Driver> OK 24672 [Thread-5] INFO org.apache.hadoop.hive.ql.Driver - OK 24672 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver> 24672 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=releaseLocks start=1610556571913 end=1610556571913 duration=0 from=org.apache.hadoop.hive.ql.Driver> 24672 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=Driver.run start=1610556570758 end=1610556571913 duration=1155 from=org.apache.hadoop.hive.ql.Driver> 24673 [Thread-5] INFO org.apache.hudi.hive.HoodieHiveClient - Time taken to execute [create database if not exists default]: 1159 ms 24691 [Thread-5] INFO org.apache.hudi.hive.HiveSyncTool - Hive table hive_hudi_sync is not found. Creating it 24712 [Thread-5] INFO org.apache.hudi.hive.HoodieHiveClient - Creating table with CREATE EXTERNAL TABLE IF NOT EXISTS `default`.`hive_hudi_sync`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `begin_lat` double, `begin_lon` double, `driver` string, `end_lat` double, `end_lon` double, `fare` double, `rider` string, `ts` double, `uuid` string) PARTITIONED BY (`partitionpath` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 'file:/tmp/hive_hudi_sync' 24728 [Thread-5] INFO org.apache.hadoop.hive.ql.session.SessionState - Created local directory: /var/folders/v8/nx847jpd1452pyg64r15m_7w0000gn/T/2c4d7adf-7f48-47a8-b8d0-2ee0330fe201_resources 24734 [Thread-5] INFO org.apache.hadoop.hive.ql.session.SessionState - Created HDFS directory: /tmp/hive/rakeshramakrishnan/2c4d7adf-7f48-47a8-b8d0-2ee0330fe201 24741 [Thread-5] INFO org.apache.hadoop.hive.ql.session.SessionState - Created local directory: /var/folders/v8/nx847jpd1452pyg64r15m_7w0000gn/T/rakeshramakrishnan/2c4d7adf-7f48-47a8-b8d0-2ee0330fe201 24747 [Thread-5] INFO org.apache.hadoop.hive.ql.session.SessionState - Created HDFS directory: /tmp/hive/rakeshramakrishnan/2c4d7adf-7f48-47a8-b8d0-2ee0330fe201/_tmp_space.db 24747 [Thread-5] INFO org.apache.hudi.hive.HoodieHiveClient - Time taken to start SessionState and create Driver: 35 ms 24747 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver> 24747 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver> 24747 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver> 24748 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver> 24748 [Thread-5] INFO hive.ql.parse.ParseDriver - Parsing command: CREATE EXTERNAL TABLE IF NOT EXISTS `default`.`hive_hudi_sync`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `begin_lat` double, `begin_lon` double, `driver` string, `end_lat` double, `end_lon` double, `fare` double, `rider` string, `ts` double, `uuid` string) PARTITIONED BY (`partitionpath` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 'file:/tmp/hive_hudi_sync' 24756 [Thread-5] INFO hive.ql.parse.ParseDriver - Parse Completed 24756 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=parse start=1610556571989 end=1610556571997 duration=8 from=org.apache.hadoop.hive.ql.Driver> 24756 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver> 24793 [Thread-5] INFO org.apache.hadoop.hive.ql.parse.CalcitePlanner - Starting Semantic Analysis 24802 [Thread-5] INFO org.apache.hadoop.hive.ql.parse.CalcitePlanner - Creating table default.hive_hudi_sync position=37 24812 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: get_table : db=default tbl=hive_hudi_sync 24812 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=rakeshramakrishnan ip=unknown-ip-addr cmd=get_table : db=default tbl=hive_hudi_sync 24813 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: get_database: default 24814 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=rakeshramakrishnan ip=unknown-ip-addr cmd=get_database: default 24832 [Thread-5] INFO org.apache.hadoop.hive.ql.Driver - Semantic Analysis Completed 24833 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=semanticAnalyze start=1610556571997 end=1610556572074 duration=77 from=org.apache.hadoop.hive.ql.Driver> 24833 [Thread-5] INFO org.apache.hadoop.hive.ql.Driver - Returning Hive schema: Schema(fieldSchemas:null, properties:null) 24833 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=compile start=1610556571988 end=1610556572074 duration=86 from=org.apache.hadoop.hive.ql.Driver> 24833 [Thread-5] INFO org.apache.hadoop.hive.ql.Driver - Concurrency mode is disabled, not creating a lock manager 24833 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=Driver.execute from=org.apache.hadoop.hive.ql.Driver> 24833 [Thread-5] INFO org.apache.hadoop.hive.ql.Driver - Starting command(queryId=rakeshramakrishnan_20210113221931_93adf670-a860-4ed6-b873-35027fee5f4e): CREATE EXTERNAL TABLE IF NOT EXISTS `default`.`hive_hudi_sync`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `begin_lat` double, `begin_lon` double, `driver` string, `end_lat` double, `end_lon` double, `fare` double, `rider` string, `ts` double, `uuid` string) PARTITIONED BY (`partitionpath` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 'file:/tmp/hive_hudi_sync' 24834 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=TimeToSubmit start=1610556571988 end=1610556572075 duration=87 from=org.apache.hadoop.hive.ql.Driver> 24834 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver> 24834 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=task.DDL.Stage-0 from=org.apache.hadoop.hive.ql.Driver> 24835 [Thread-5] INFO org.apache.hadoop.hive.ql.Driver - Starting task [Stage-0:DDL] in serial mode 24892 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: create_table: Table(tableName:hive_hudi_sync, dbName:default, owner:rakeshramakrishnan, createTime:1610556572, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:_hoodie_commit_time, type:string, comment:null), FieldSchema(name:_hoodie_commit_seqno, type:string, comment:null), FieldSchema(name:_hoodie_record_key, type:string, comment:null), FieldSchema(name:_hoodie_partition_path, type:string, comment:null), FieldSchema(name:_hoodie_file_name, type:string, comment:null), FieldSchema(name:begin_lat, type:double, comment:null), FieldSchema(name:begin_lon, type:double, comment:null), FieldSchema(name:driver, type:string, comment:null), FieldSchema(name:end_lat, type:double, comment:null), FieldSchema(name:end_lon, type:double, comment:null), FieldSchema(name:fare, type:double, comment:null), FieldSchema(name:rider, type:string, comment:null), FieldSchema(name:ts, type:double, comment:n ull), FieldSchema(name:uuid, type:string, comment:null)], location:file:/tmp/hive_hudi_sync, inputFormat:org.apache.hudi.hadoop.HoodieParquetInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[FieldSchema(name:partitionpath, type:string, comment:null)], parameters:{EXTERNAL=TRUE}, viewOriginalText:null, viewExpandedText:null, tableType:EXTERNAL_TABLE, privileges:PrincipalPrivilegeSet(userPrivileges:{}, groupPrivileges:null, rolePrivileges:null), temporary:false) 24893 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=rakeshramakrishnan ip=unknown-ip-addr cmd=create_table: Table(tableName:hive_hudi_sync, dbName:default, owner:rakeshramakrishnan, createTime:1610556572, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:_hoodie_commit_time, type:string, comment:null), FieldSchema(name:_hoodie_commit_seqno, type:string, comment:null), FieldSchema(name:_hoodie_record_key, type:string, comment:null), FieldSchema(name:_hoodie_partition_path, type:string, comment:null), FieldSchema(name:_hoodie_file_name, type:string, comment:null), FieldSchema(name:begin_lat, type:double, comment:null), FieldSchema(name:begin_lon, type:double, comment:null), FieldSchema(name:driver, type:string, comment:null), FieldSchema(name:end_lat, type:double, comment:null), FieldSchema(name:end_lon, type:double, comment:null), FieldSchema(name:fare, type:double, comment:null), FieldSchema(name:rider, type:string, comment:n ull), FieldSchema(name:ts, type:double, comment:null), FieldSchema(name:uuid, type:string, comment:null)], location:file:/tmp/hive_hudi_sync, inputFormat:org.apache.hudi.hadoop.HoodieParquetInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[FieldSchema(name:partitionpath, type:string, comment:null)], parameters:{EXTERNAL=TRUE}, viewOriginalText:null, viewExpandedText:null, tableType:EXTERNAL_TABLE, privileges:PrincipalPrivilegeSet(userPrivileges:{}, groupPrivileges:null, rolePrivileges:null), temporary:false) 25064 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=runTasks start=1610556572075 end=1610556572305 duration=230 from=org.apache.hadoop.hive.ql.Driver> 25064 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=Driver.execute start=1610556572074 end=1610556572305 duration=231 from=org.apache.hadoop.hive.ql.Driver> OK 25064 [Thread-5] INFO org.apache.hadoop.hive.ql.Driver - OK 25064 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver> 25064 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=releaseLocks start=1610556572305 end=1610556572305 duration=0 from=org.apache.hadoop.hive.ql.Driver> 25064 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=Driver.run start=1610556571988 end=1610556572305 duration=317 from=org.apache.hadoop.hive.ql.Driver> 25064 [Thread-5] INFO org.apache.hudi.hive.HoodieHiveClient - Time taken to execute [CREATE EXTERNAL TABLE IF NOT EXISTS `default`.`hive_hudi_sync`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `begin_lat` double, `begin_lon` double, `driver` string, `end_lat` double, `end_lon` double, `fare` double, `rider` string, `ts` double, `uuid` string) PARTITIONED BY (`partitionpath` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 'file:/tmp/hive_hudi_sync']: 317 ms 25065 [Thread-5] INFO org.apache.hudi.hive.HiveSyncTool - Schema sync complete. Syncing partitions for hive_hudi_sync 25065 [Thread-5] INFO org.apache.hudi.hive.HiveSyncTool - Last commit time synced was found to be null 25066 [Thread-5] INFO org.apache.hudi.sync.common.AbstractSyncHoodieClient - Last commit time synced is not known, listing all partitions in file:/tmp/hive_hudi_sync,FS :org.apache.hadoop.fs.LocalFileSystem@19ee471f 25089 [Thread-5] INFO org.apache.hudi.hive.HiveSyncTool - Storage partitions scan complete. Found 1 25089 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: get_partitions : db=default tbl=hive_hudi_sync 25090 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=rakeshramakrishnan ip=unknown-ip-addr cmd=get_partitions : db=default tbl=hive_hudi_sync 25122 [Thread-5] INFO org.apache.hudi.hive.HiveSyncTool - New Partitions [partitionval] 25122 [Thread-5] INFO org.apache.hudi.hive.HoodieHiveClient - Adding partitions 1 to table hive_hudi_sync 25138 [Thread-5] INFO org.apache.hadoop.hive.ql.session.SessionState - Created local directory: /var/folders/v8/nx847jpd1452pyg64r15m_7w0000gn/T/2c4d7adf-7f48-47a8-b8d0-2ee0330fe201_resources 25144 [Thread-5] INFO org.apache.hadoop.hive.ql.session.SessionState - Created HDFS directory: /tmp/hive/rakeshramakrishnan/2c4d7adf-7f48-47a8-b8d0-2ee0330fe201 25150 [Thread-5] INFO org.apache.hadoop.hive.ql.session.SessionState - Created local directory: /var/folders/v8/nx847jpd1452pyg64r15m_7w0000gn/T/rakeshramakrishnan/2c4d7adf-7f48-47a8-b8d0-2ee0330fe201 25156 [Thread-5] INFO org.apache.hadoop.hive.ql.session.SessionState - Created HDFS directory: /tmp/hive/rakeshramakrishnan/2c4d7adf-7f48-47a8-b8d0-2ee0330fe201/_tmp_space.db 25157 [Thread-5] INFO org.apache.hudi.hive.HoodieHiveClient - Time taken to start SessionState and create Driver: 35 ms 25157 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver> 25157 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver> 25157 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver> 25157 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver> 25157 [Thread-5] INFO hive.ql.parse.ParseDriver - Parsing command: ALTER TABLE `default`.`hive_hudi_sync` ADD IF NOT EXISTS PARTITION (`partitionpath`='partitionval') LOCATION 'file:/tmp/hive_hudi_sync/partitionval' 25161 [Thread-5] INFO hive.ql.parse.ParseDriver - Parse Completed 25161 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=parse start=1610556572398 end=1610556572402 duration=4 from=org.apache.hadoop.hive.ql.Driver> 25161 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver> 25162 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: get_table : db=default tbl=hive_hudi_sync 25162 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=rakeshramakrishnan ip=unknown-ip-addr cmd=get_table : db=default tbl=hive_hudi_sync 25347 [Thread-5] INFO org.apache.hadoop.hive.ql.Driver - Semantic Analysis Completed 25347 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=semanticAnalyze start=1610556572402 end=1610556572588 duration=186 from=org.apache.hadoop.hive.ql.Driver> 25347 [Thread-5] INFO org.apache.hadoop.hive.ql.Driver - Returning Hive schema: Schema(fieldSchemas:null, properties:null) 25347 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=compile start=1610556572398 end=1610556572588 duration=190 from=org.apache.hadoop.hive.ql.Driver> 25347 [Thread-5] INFO org.apache.hadoop.hive.ql.Driver - Concurrency mode is disabled, not creating a lock manager 25347 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=Driver.execute from=org.apache.hadoop.hive.ql.Driver> 25347 [Thread-5] INFO org.apache.hadoop.hive.ql.Driver - Starting command(queryId=rakeshramakrishnan_20210113221932_5b183376-504e-4518-9c26-46d6773384df): ALTER TABLE `default`.`hive_hudi_sync` ADD IF NOT EXISTS PARTITION (`partitionpath`='partitionval') LOCATION 'file:/tmp/hive_hudi_sync/partitionval' 25347 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=TimeToSubmit start=1610556572398 end=1610556572588 duration=190 from=org.apache.hadoop.hive.ql.Driver> 25347 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver> 25347 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=task.DDL.Stage-0 from=org.apache.hadoop.hive.ql.Driver> 25348 [Thread-5] INFO org.apache.hadoop.hive.ql.Driver - Starting task [Stage-0:DDL] in serial mode 25348 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: get_table : db=default tbl=hive_hudi_sync 25348 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=rakeshramakrishnan ip=unknown-ip-addr cmd=get_table : db=default tbl=hive_hudi_sync 25375 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: add_partitions 25375 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=rakeshramakrishnan ip=unknown-ip-addr cmd=add_partitions 25427 [Thread-5] WARN hive.log - Updating partition stats fast for: hive_hudi_sync 25428 [Thread-5] WARN hive.log - Updated size to 437811 25479 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=runTasks start=1610556572588 end=1610556572720 duration=132 from=org.apache.hadoop.hive.ql.Driver> 25480 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=Driver.execute start=1610556572588 end=1610556572721 duration=133 from=org.apache.hadoop.hive.ql.Driver> OK 25480 [Thread-5] INFO org.apache.hadoop.hive.ql.Driver - OK 25480 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver> 25480 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=releaseLocks start=1610556572721 end=1610556572721 duration=0 from=org.apache.hadoop.hive.ql.Driver> 25480 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=Driver.run start=1610556572398 end=1610556572721 duration=323 from=org.apache.hadoop.hive.ql.Driver> 25480 [Thread-5] INFO org.apache.hudi.hive.HoodieHiveClient - Time taken to execute [ALTER TABLE `default`.`hive_hudi_sync` ADD IF NOT EXISTS PARTITION (`partitionpath`='partitionval') LOCATION 'file:/tmp/hive_hudi_sync/partitionval' ]: 323 ms 25482 [Thread-5] INFO org.apache.hudi.hive.HiveSyncTool - Changed Partitions [] 25482 [Thread-5] INFO org.apache.hudi.hive.HoodieHiveClient - No partitions to change for hive_hudi_sync 25482 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: get_table : db=default tbl=hive_hudi_sync 25482 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=rakeshramakrishnan ip=unknown-ip-addr cmd=get_table : db=default tbl=hive_hudi_sync 25497 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: alter_table: db=default tbl=hive_hudi_sync newtbl=hive_hudi_sync 25497 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=rakeshramakrishnan ip=unknown-ip-addr cmd=alter_table: db=default tbl=hive_hudi_sync newtbl=hive_hudi_sync 25560 [Thread-5] INFO org.apache.hudi.hive.HiveSyncTool - Sync complete for hive_hudi_sync 25560 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: Shutting down the object store... 25560 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=rakeshramakrishnan ip=unknown-ip-addr cmd=Shutting down the object store... 25560 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: Metastore shutdown complete. 25560 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=rakeshramakrishnan ip=unknown-ip-addr cmd=Metastore shutdown complete. 25560 [Thread-5] INFO org.apache.hudi.HoodieSparkSqlWriter$ - Is Async Compaction Enabled ? false 25560 [Thread-5] INFO org.apache.hudi.client.AbstractHoodieClient - Stopping Timeline service !! 25560 [Thread-5] INFO org.apache.hudi.client.embedded.EmbeddedTimelineService - Closing Timeline server 25560 [Thread-5] INFO org.apache.hudi.timeline.service.TimelineService - Closing Timeline Service 25561 [Thread-5] INFO io.javalin.Javalin - Stopping Javalin ... 25575 [Thread-5] INFO io.javalin.Javalin - Javalin has stopped 25576 [Thread-5] INFO org.apache.hudi.timeline.service.TimelineService - Closed Timeline Service 25576 [Thread-5] INFO org.apache.hudi.client.embedded.EmbeddedTimelineService - Closed Timeline server 31552 [Thread-5] INFO org.apache.spark.scheduler.DAGScheduler - Job 13 finished: hasNext at NativeMethodAccessorImpl.java:0, took 0.021244 s After [Table(name='****', database='default', description=None, tableType='MANAGED', isTemporary=False), .... tables in hive metastore] 31588 [Thread-1] INFO org.apache.spark.SparkContext - Invoking stop() from shutdown hook 31597 [Thread-1] INFO org.spark_project.jetty.server.AbstractConnector - Stopped Spark@2239cd56{HTTP/1.1,[http/1.1]}{0.0.0.0:4040} 31599 [Thread-1] INFO org.apache.spark.ui.SparkUI - Stopped Spark web UI at http://192.168.0.104:4040 31665 [Thread-1] INFO org.apache.spark.SparkContext - Successfully stopped SparkContext ######### New spark shell ################ ~/OSS/spark/spark-2.4.7-bin-hadoop2.7 34s .venv ❯ bin/pyspark Python 3.7.5 (default, Dec 29 2020, 13:08:16) SparkSession available as 'spark'. >>> spark.catalog.listTables() 11457 [Thread-3] WARN org.apache.hadoop.hive.metastore.ObjectStore - Failed to get database global_temp, returning NoSuchObjectException [Table(name='hive_hudi_sync', database='default', description=None, tableType='EXTERNAL', isTemporary=False)] >>> ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
