tooptoop4 commented on issue #1954:
URL: https://github.com/apache/hudi/issues/1954#issuecomment-672843556
got a bit further with the below, now hudi/spark job succeeds but the hive
ddl is pointing at wrong s3 location, so doing select from hive/presto gives
error. But when i manually alter the s3 location in the table ddl via
hiveserver2 then it works (ie change LOCATION 's3a://redact/my2/multpk7' to
LOCATION 's3a://redact/my2/multpk7/default'), so i think there should be some
code change to make it create table at proper s3 location.
```
/home/ec2-user/spark_home/bin/spark-submit --conf
"spark.hadoop.fs.s3a.proxy.host=redact" --conf
"spark.hadoop.fs.s3a.proxy.port=redact" --conf
"spark.driver.extraClassPath=/home/ec2-user/json-20090211.jar" --conf
"spark.executor.extraClassPath=/home/ec2-user/json-20090211.jar" --class
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer --jars
"/home/ec2-user/spark-avro_2.11-2.4.6.jar" --master spark://redact:7077
--deploy-mode client /home/ec2-user/hudi-utilities-bundle_2.11-0.5.3-1.jar
--table-type COPY_ON_WRITE --source-ordering-field TimeCreated --source-class
org.apache.hudi.utilities.sources.ParquetDFSSource --enable-hive-sync
--hoodie-conf hoodie.datasource.hive_sync.database=redact --hoodie-conf
hoodie.datasource.hive_sync.table=dmstest_multpk7 --hoodie-conf
hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.NonPartitionedExtractor
--hoodie-conf hoodie.datasource.hive_sync.use_jdbc=false --target-base-path
s3a://redact/my2/multpk7 --target-
table dmstest_multpk7 --transformer-class
org.apache.hudi.utilities.transform.AWSDmsTransformer --payload-class
org.apache.hudi.payload.AWSDmsAvroPayload --hoodie-conf
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator
--hoodie-conf hoodie.datasource.write.recordkey.field=version_no,group_company
--hoodie-conf "hoodie.datasource.write.partitionpath.field=" --hoodie-conf
hoodie.deltastreamer.source.dfs.root=s3a://redact/dbo/tbl > multpk7.log
OK
```
cat multpk7.log
```
2020-08-12 12:18:15,375 [main] WARN
org.apache.hudi.utilities.deltastreamer.SchedulerConfGenerator - Job Scheduling
Configs will not be in effect as spark.scheduler.mode is not set to FAIR at
instantiation time. Continuing without scheduling configs
2020-08-12 12:18:16,386 [dispatcher-event-loop-3] INFO
org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend - Connected to
Spark cluster with app ID app-20200812121816-0086
2020-08-12 12:18:17,199 [main] INFO com.amazonaws.http.AmazonHttpClient -
Configuring Proxy. redact
2020-08-12 12:18:18,154 [main] INFO
org.apache.spark.scheduler.EventLoggingListener - Logging events to
s3a://redact/sparkevents/app-20200812121816-0086
2020-08-12 12:18:18,171 [dispatcher-event-loop-2] INFO
org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend - Granted
executor ID app-20200812121816-0086/0 on hostPort redact:19629 with 4 core(s),
7.9 GB RAM
2020-08-12 12:18:18,195 [main] INFO
org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend -
SchedulerBackend is ready for scheduling beginning after reached
minRegisteredResourcesRatio: 0.0
2020-08-12 12:18:18,427 [main] WARN org.apache.spark.SparkContext - Using
an existing SparkContext; some configuration may not take effect.
2020-08-12 12:18:18,526 [main] ERROR
org.apache.hudi.common.util.DFSPropertiesConfiguration - Error reading in
properies from dfs
java.io.FileNotFoundException: File
file:/home/ec2-user/http_listener/logs/src/test/resources/delta-streamer-config/dfs-source.properties
does not exist
at
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:635)
at
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:861)
at
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:625)
at
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
at
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:146)
at
org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:787)
at
org.apache.hudi.common.util.DFSPropertiesConfiguration.visitFile(DFSPropertiesConfiguration.java:87)
at
org.apache.hudi.common.util.DFSPropertiesConfiguration.<init>(DFSPropertiesConfiguration.java:60)
at
org.apache.hudi.common.util.DFSPropertiesConfiguration.<init>(DFSPropertiesConfiguration.java:64)
at
org.apache.hudi.utilities.UtilHelpers.readConfig(UtilHelpers.java:118)
at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.<init>(HoodieDeltaStreamer.java:451)
at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.<init>(HoodieDeltaStreamer.java:97)
at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.<init>(HoodieDeltaStreamer.java:91)
at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:380)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
at
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
2020-08-12 12:18:18,528 [main] WARN org.apache.hudi.utilities.UtilHelpers -
Unexpected error read props file at
:file:/home/ec2-user/http_listener/logs/src/test/resources/delta-streamer-config/dfs-source.properties
java.lang.IllegalArgumentException: Cannot read properties from dfs
at
org.apache.hudi.common.util.DFSPropertiesConfiguration.visitFile(DFSPropertiesConfiguration.java:91)
at
org.apache.hudi.common.util.DFSPropertiesConfiguration.<init>(DFSPropertiesConfiguration.java:60)
at
org.apache.hudi.common.util.DFSPropertiesConfiguration.<init>(DFSPropertiesConfiguration.java:64)
at
org.apache.hudi.utilities.UtilHelpers.readConfig(UtilHelpers.java:118)
at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.<init>(HoodieDeltaStreamer.java:451)
at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.<init>(HoodieDeltaStreamer.java:97)
at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.<init>(HoodieDeltaStreamer.java:91)
at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:380)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
at
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.FileNotFoundException: File
file:/home/ec2-user/http_listener/logs/src/test/resources/delta-streamer-config/dfs-source.properties
does not exist
at
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:635)
at
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:861)
at
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:625)
at
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
at
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:146)
at
org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:787)
at
org.apache.hudi.common.util.DFSPropertiesConfiguration.visitFile(DFSPropertiesConfiguration.java:87)
... 19 more
2020-08-12 12:18:18,528 [main] INFO org.apache.hudi.utilities.UtilHelpers -
Adding overridden properties to file properties.
2020-08-12 12:18:18,529 [main] INFO
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer - Creating delta
streamer with configs : {hoodie.datasource.hive_sync.use_jdbc=false,
hoodie.datasource.write.recordkey.field=version_no,group_company,
hoodie.datasource.write.partitionpath.field=,
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator,
hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.NonPartitionedExtractor,
hoodie.datasource.hive_sync.table=dmstest_multpk7,
hoodie.deltastreamer.source.dfs.root=s3a://redact/dbo/tbl,
hoodie.datasource.hive_sync.database=redact}
2020-08-12 12:18:18,533 [main] INFO
org.apache.hudi.utilities.deltastreamer.DeltaSync - Creating delta streamer
with configs : {hoodie.datasource.hive_sync.use_jdbc=false,
hoodie.datasource.write.recordkey.field=version_no,group_company,
hoodie.datasource.write.partitionpath.field=,
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator,
hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.NonPartitionedExtractor,
hoodie.datasource.hive_sync.table=dmstest_multpk7,
hoodie.deltastreamer.source.dfs.root=s3a://redact/dbo/tbl,
hoodie.datasource.hive_sync.database=redact}
2020-08-12 12:18:19,798 [main] INFO
org.apache.hudi.utilities.deltastreamer.DeltaSync - Setting up Hoodie Write
Client
2020-08-12 12:18:19,799 [main] INFO
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer - Delta Streamer
running only single round
2020-08-12 12:18:20,218 [main] INFO
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants []
2020-08-12 12:18:20,222 [main] INFO
org.apache.hudi.utilities.deltastreamer.DeltaSync - Checkpoint to resume from :
Option{val=null}
2020-08-12 12:18:42,136 [main] INFO
org.apache.hudi.utilities.deltastreamer.DeltaSync - Setting up Hoodie Write
Client
2020-08-12 12:18:42,156 [main] INFO
org.apache.hudi.utilities.deltastreamer.DeltaSync - Registering Schema
:[{"type":"record","name":"hoodie_source","namespace":"hoodie.source","fields":[{"name":"Op","type":["string","null"]},{"name":"Id","type":["int","null"]},{"name":"AuditProcessHistoryId","type":["int","null"]},{"name":"org_id","type":["int","null"]},{"name":"org_name","type":["string","null"]},{"name":"org_sname","type":["string","null"]},{"name":"org_mnem","type":["string","null"]},{"name":"org_parent","type":["int","null"]},{"name":"percent_holding","type":["double","null"]},{"name":"group_company","type":["string","null"]},{"name":"grp_ord_for_cln","type":["string","null"]},{"name":"mkt_only","type":["string","null"]},{"name":"pro_rate_ind","type":["string","null"]},{"name":"show_shapes","type":["string","null"]},{"name":"sec_code_pref","type":["string","null"]},{"name":"alert_org_ref","type":["string","null"]},{"name":"swift_bic","type":["string","null"]},{"name":"exec_b
reakdown","type":["string","null"]},{"name":"notes","type":["string","null"]},{"name":"active","type":["string","null"]},{"name":"version_no","type":["int","null"]},{"name":"sys_date","type":[{"type":"long","logicalType":"timestamp-micros"},"null"]},{"name":"sys_user","type":["string","null"]},{"name":"create_date","type":[{"type":"long","logicalType":"timestamp-micros"},"null"]},{"name":"cntry_of_dom","type":["string","null"]},{"name":"client","type":["string","null"]},{"name":"alert_acronym","type":["string","null"]},{"name":"oneoff_client","type":["string","null"]},{"name":"booking_domicile","type":["string","null"]},{"name":"booking_dom_list","type":["string","null"]},{"name":"TimeCreated","type":[{"type":"long","logicalType":"timestamp-micros"},"null"]},{"name":"UserCreated","type":["string","null"]}]},
{"type":"record","name":"hoodie_source","namespace":"hoodie.source","fields":[{"name":"Op","type":["string","null"]},{"name":"Id","type":["int","null"]},{"name":"AuditProcessHis
toryId","type":["int","null"]},{"name":"org_id","type":["int","null"]},{"name":"org_name","type":["string","null"]},{"name":"org_sname","type":["string","null"]},{"name":"org_mnem","type":["string","null"]},{"name":"org_parent","type":["int","null"]},{"name":"percent_holding","type":["double","null"]},{"name":"group_company","type":["string","null"]},{"name":"grp_ord_for_cln","type":["string","null"]},{"name":"mkt_only","type":["string","null"]},{"name":"pro_rate_ind","type":["string","null"]},{"name":"show_shapes","type":["string","null"]},{"name":"sec_code_pref","type":["string","null"]},{"name":"alert_org_ref","type":["string","null"]},{"name":"swift_bic","type":["string","null"]},{"name":"exec_breakdown","type":["string","null"]},{"name":"notes","type":["string","null"]},{"name":"active","type":["string","null"]},{"name":"version_no","type":["int","null"]},{"name":"sys_date","type":[{"type":"long","logicalType":"timestamp-micros"},"null"]},{"name":"sys_user","type":["string","nu
ll"]},{"name":"create_date","type":[{"type":"long","logicalType":"timestamp-micros"},"null"]},{"name":"cntry_of_dom","type":["string","null"]},{"name":"client","type":["string","null"]},{"name":"alert_acronym","type":["string","null"]},{"name":"oneoff_client","type":["string","null"]},{"name":"booking_domicile","type":["string","null"]},{"name":"booking_dom_list","type":["string","null"]},{"name":"TimeCreated","type":[{"type":"long","logicalType":"timestamp-micros"},"null"]},{"name":"UserCreated","type":["string","null"]}]}]
2020-08-12 12:18:50,361 [main] INFO
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants []
2020-08-12 12:18:50,934 [main] INFO
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants []
2020-08-12 12:18:50,937 [main] INFO
org.apache.hudi.client.HoodieWriteClient - Generate a new instant time
20200812121850
2020-08-12 12:18:51,226 [main] INFO
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants []
2020-08-12 12:18:51,234 [main] INFO
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Creating a new
instant [==>20200812121850__commit__REQUESTED]
2020-08-12 12:18:51,415 [main] INFO
org.apache.hudi.utilities.deltastreamer.DeltaSync - Starting commit :
20200812121850
2020-08-12 12:18:51,699 [main] INFO
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants
[[==>20200812121850__commit__REQUESTED]]
2020-08-12 12:18:51,982 [main] INFO
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants
[[==>20200812121850__commit__REQUESTED]]
2020-08-12 12:19:21,501 [main] INFO
org.apache.hudi.index.bloom.HoodieBloomIndex - InputParallelism: ${1500},
IndexParallelism: ${0}
2020-08-12 12:19:32,817 [main] INFO
org.apache.hudi.client.HoodieWriteClient - Workload profile :WorkloadProfile
{globalStat=WorkloadStat {numInserts=103, numUpdates=0},
partitionStat={default=WorkloadStat {numInserts=103, numUpdates=0}}}
2020-08-12 12:19:32,841 [main] INFO
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Checking for file
exists ?s3a://redact/my2/multpk7/.hoodie/20200812121850.commit.requested
2020-08-12 12:19:33,081 [main] INFO
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Create new file
for toInstant ?s3a://redact/my2/multpk7/.hoodie/20200812121850.inflight
2020-08-12 12:19:33,082 [main] INFO
org.apache.hudi.table.HoodieCopyOnWriteTable - AvgRecordSize => 1024
2020-08-12 12:19:33,184 [main] INFO
org.apache.hudi.table.HoodieCopyOnWriteTable - For partitionPath : default
Small Files => []
2020-08-12 12:19:33,184 [main] INFO
org.apache.hudi.table.HoodieCopyOnWriteTable - After small file assignment:
unassignedInserts => 103, totalInsertBuckets => 1, recordsPerBucket => 122880
2020-08-12 12:19:33,185 [main] INFO
org.apache.hudi.table.HoodieCopyOnWriteTable - Total insert buckets for
partition path default => [WorkloadStat {bucketNumber=0, weight=1.0}]
2020-08-12 12:19:33,186 [main] INFO
org.apache.hudi.table.HoodieCopyOnWriteTable - Total Buckets :1, buckets info
=> {0=BucketInfo {bucketType=INSERT,
fileIdPrefix=a9ab6f7a-4def-490a-aac0-49e15ee9d742}},
Partition to insert buckets => {default=[WorkloadStat {bucketNumber=0,
weight=1.0}]},
UpdateLocations mapped to buckets =>{}
2020-08-12 12:19:33,206 [main] INFO
org.apache.hudi.client.AbstractHoodieWriteClient - Auto commit disabled for
20200812121850
2020-08-12 12:19:41,179 [main] INFO
org.apache.hudi.client.AbstractHoodieWriteClient - Commiting 20200812121850
2020-08-12 12:19:41,502 [main] INFO
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants
[[==>20200812121850__commit__INFLIGHT]]
2020-08-12 12:19:41,777 [main] INFO
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants
[[==>20200812121850__commit__INFLIGHT]]
2020-08-12 12:19:42,140 [main] INFO
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants
[[==>20200812121850__commit__INFLIGHT]]
2020-08-12 12:19:42,479 [main] INFO
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants
[[==>20200812121850__commit__INFLIGHT]]
2020-08-12 12:19:42,706 [main] INFO org.apache.hudi.table.HoodieTable -
Removing marker directory=s3a://redact/my2/multpk7/.hoodie/.temp/20200812121850
2020-08-12 12:19:43,027 [main] INFO
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Marking instant
complete [==>20200812121850__commit__INFLIGHT]
2020-08-12 12:19:43,027 [main] INFO
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Checking for file
exists ?s3a://redact/my2/multpk7/.hoodie/20200812121850.inflight
2020-08-12 12:19:43,356 [main] INFO
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Create new file
for toInstant ?s3a://redact/my2/multpk7/.hoodie/20200812121850.commit
2020-08-12 12:19:43,357 [main] INFO
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Completed
[==>20200812121850__commit__INFLIGHT]
2020-08-12 12:19:43,745 [main] INFO
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants
[[20200812121850__commit__COMPLETED]]
2020-08-12 12:19:44,010 [main] INFO
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants
[[20200812121850__commit__COMPLETED]]
2020-08-12 12:19:44,084 [main] INFO
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants
[[==>20200812121850__commit__REQUESTED], [==>20200812121850__commit__INFLIGHT],
[20200812121850__commit__COMPLETED]]
2020-08-12 12:19:44,085 [main] INFO
org.apache.hudi.table.HoodieCommitArchiveLog - No Instants to archive
2020-08-12 12:19:44,086 [main] INFO
org.apache.hudi.client.HoodieWriteClient - Auto cleaning is enabled. Running
cleaner now
2020-08-12 12:19:44,356 [main] INFO
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants
[[20200812121850__commit__COMPLETED]]
2020-08-12 12:19:44,629 [main] INFO
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants
[[20200812121850__commit__COMPLETED]]
2020-08-12 12:19:44,912 [main] INFO
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants
[[20200812121850__commit__COMPLETED]]
2020-08-12 12:19:45,321 [main] INFO
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants
[[20200812121850__commit__COMPLETED]]
2020-08-12 12:19:45,337 [main] INFO org.apache.hudi.table.CleanHelper - No
earliest commit to retain. No need to scan partitions !!
2020-08-12 12:19:45,337 [main] INFO
org.apache.hudi.table.HoodieCopyOnWriteTable - Nothing to clean here. It is
already clean
2020-08-12 12:19:45,374 [main] INFO
org.apache.hudi.client.AbstractHoodieWriteClient - Committed 20200812121850
2020-08-12 12:19:45,374 [main] INFO
org.apache.hudi.utilities.deltastreamer.DeltaSync - Commit 20200812121850
successful!
2020-08-12 12:19:45,375 [main] INFO
org.apache.hudi.utilities.deltastreamer.DeltaSync - Syncing target hoodie table
with hive table(dmstest_multpk7). Hive metastore URL
:jdbc:hive2://localhost:10000, basePath :s3a://redact/my2/multpk7
2020-08-12 12:19:45,636 [main] INFO
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants
[[20200812121850__commit__COMPLETED]]
2020-08-12 12:19:46,806 [main] INFO org.apache.hudi.hive.HiveSyncTool -
Trying to sync hoodie table dmstest_multpk7 with base path
s3a://redact/my2/multpk7 of type COPY_ON_WRITE
2020-08-12 12:19:46,864 [main] INFO org.apache.hudi.hive.HoodieHiveClient -
Reading schema from
s3a://redact/my2/multpk7/default/a9ab6f7a-4def-490a-aac0-49e15ee9d742-0_0-25-15010_20200812121850.parquet
2020-08-12 12:19:47,064 [main] INFO org.apache.hudi.hive.HiveSyncTool -
Hive table dmstest_multpk7 is not found. Creating it
2020-08-12 12:19:47,070 [main] INFO org.apache.hudi.hive.HoodieHiveClient -
Creating table with CREATE EXTERNAL TABLE IF NOT EXISTS
`redact`.`dmstest_multpk7`( `_hoodie_commit_time` string,
`_hoodie_commit_seqno` string, `_hoodie_record_key` string,
`_hoodie_partition_path` string, `_hoodie_file_name` string, `Op` string, `Id`
int, `AuditProcessHistoryId` int, `org_id` int, `org_name` string, `org_sname`
string, `org_mnem` string, `org_parent` int, `percent_holding` double,
`group_company` string, `grp_ord_for_cln` string, `mkt_only` string,
`pro_rate_ind` string, `show_shapes` string, `sec_code_pref` string,
`alert_org_ref` string, `swift_bic` string, `exec_breakdown` string, `notes`
string, `active` string, `version_no` int, `sys_date` bigint, `sys_user`
string, `create_date` bigint, `cntry_of_dom` string, `client` string,
`alert_acronym` string, `oneoff_client` string, `booking_domicile` string,
`booking_dom_list` string, `TimeCreated` bigint, `UserCreated` string) ROW
FORMAT
SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS
INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION
's3a://redact/my2/multpk7'
2020-08-12 12:19:47,151 [main] INFO org.apache.hudi.hive.HoodieHiveClient -
Time taken to start SessionState and create Driver: 81 ms
2020-08-12 12:19:47,186 [main] INFO hive.ql.parse.ParseDriver - Parsing
command: CREATE EXTERNAL TABLE IF NOT EXISTS `redact`.`dmstest_multpk7`(
`_hoodie_commit_time` string, `_hoodie_commit_seqno` string,
`_hoodie_record_key` string, `_hoodie_partition_path` string,
`_hoodie_file_name` string, `Op` string, `Id` int, `AuditProcessHistoryId` int,
`org_id` int, `org_name` string, `org_sname` string, `org_mnem` string,
`org_parent` int, `percent_holding` double, `group_company` string,
`grp_ord_for_cln` string, `mkt_only` string, `pro_rate_ind` string,
`show_shapes` string, `sec_code_pref` string, `alert_org_ref` string,
`swift_bic` string, `exec_breakdown` string, `notes` string, `active` string,
`version_no` int, `sys_date` bigint, `sys_user` string, `create_date` bigint,
`cntry_of_dom` string, `client` string, `alert_acronym` string, `oneoff_client`
string, `booking_domicile` string, `booking_dom_list` string, `TimeCreated`
bigint, `UserCreated` string) ROW FORMAT SERDE 'org.apa
che.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT
'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION
's3a://redact/my2/multpk7'
2020-08-12 12:19:47,874 [main] INFO hive.ql.parse.ParseDriver - Parse
Completed
2020-08-12 12:19:48,323 [main] INFO org.apache.hudi.hive.HoodieHiveClient -
Time taken to execute [CREATE EXTERNAL TABLE IF NOT EXISTS
`redact`.`dmstest_multpk7`( `_hoodie_commit_time` string,
`_hoodie_commit_seqno` string, `_hoodie_record_key` string,
`_hoodie_partition_path` string, `_hoodie_file_name` string, `Op` string, `Id`
int, `AuditProcessHistoryId` int, `org_id` int, `org_name` string, `org_sname`
string, `org_mnem` string, `org_parent` int, `percent_holding` double,
`group_company` string, `grp_ord_for_cln` string, `mkt_only` string,
`pro_rate_ind` string, `show_shapes` string, `sec_code_pref` string,
`alert_org_ref` string, `swift_bic` string, `exec_breakdown` string, `notes`
string, `active` string, `version_no` int, `sys_date` bigint, `sys_user`
string, `create_date` bigint, `cntry_of_dom` string, `client` string,
`alert_acronym` string, `oneoff_client` string, `booking_domicile` string,
`booking_dom_list` string, `TimeCreated` bigint, `UserCreated` string) ROW FOR
MAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED
AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION
's3a://redact/my2/multpk7']: 1171 ms
2020-08-12 12:19:48,329 [main] INFO org.apache.hudi.hive.HiveSyncTool -
Schema sync complete. Syncing partitions for dmstest_multpk7
2020-08-12 12:19:48,329 [main] INFO org.apache.hudi.hive.HiveSyncTool -
Last commit time synced was found to be null
2020-08-12 12:19:48,330 [main] INFO org.apache.hudi.hive.HoodieHiveClient -
Last commit time synced is not known, listing all partitions in
s3a://redact/my2/multpk7,FS :S3AFileSystem{uri=s3a://redact,
workingDir=s3a://redact/user/ec2-user, inputPolicy=normal, partSize=104857600,
enableMultiObjectsDelete=true, maxKeys=5000, readAhead=65536,
blockSize=33554432, multiPartThreshold=2147483647,
serverSideEncryptionAlgorithm='AES256',
blockFactory=org.apache.hadoop.fs.s3a.S3ADataBlocks$DiskBlockFactory@62765aec,
boundedExecutor=BlockingThreadPoolExecutorService{SemaphoredDelegatingExecutor{permitCount=2405,
available=2405, waiting=0}, activeCount=0},
unboundedExecutor=java.util.concurrent.ThreadPoolExecutor@6f5bd362[Running,
pool size = 6, active threads = 0, queued tasks = 0, completed tasks = 6],
statistics {445890 bytes read, 4324 bytes written, 172 read ops, 0 large read
ops, 31 write ops}, metrics {{Context=S3AFileSystem}
{FileSystemId=aad8f6ce-2b40-4ddb-9b9b-4e82033cb193-redact}
{fsURI=s3a://redact/sparkevents} {files_created=5} {files_copied=0}
{files_copied_bytes=0} {files_deleted=1} {fake_directories_deleted=0}
{directories_created=6} {directories_deleted=0} {ignored_errors=4}
{op_copy_from_local_file=0} {op_exists=53} {op_get_file_status=145}
{op_glob_status=0} {op_is_directory=38} {op_is_file=0} {op_list_files=1}
{op_list_located_status=0} {op_list_status=19} {op_mkdirs=5} {op_rename=0}
{object_copy_requests=0} {object_delete_requests=5} {object_list_requests=140}
{object_continue_list_requests=0} {object_metadata_requests=265}
{object_multipart_aborted=0} {object_put_bytes=4324} {object_put_requests=10}
{object_put_requests_completed=10} {stream_write_failures=0}
{stream_write_block_uploads=0} {stream_write_block_uploads_committed=0}
{stream_write_block_uploads_aborted=0} {stream_write_total_time=0}
{stream_write_total_data=4324} {object_put_requests_active=0}
{object_put_bytes_pending=0} {stream_write_block_uploads_active=0}
{stream_write_block_uploa
ds_pending=4} {stream_write_block_uploads_data_pending=0}
{stream_read_fully_operations=0} {stream_opened=22}
{stream_bytes_skipped_on_seek=0} {stream_closed=22}
{stream_bytes_backwards_on_seek=438082} {stream_bytes_read=445890}
{stream_read_operations_incomplete=71} {stream_bytes_discarded_in_abort=0}
{stream_close_operations=22} {stream_read_operations=2764} {stream_aborted=0}
{stream_forward_seek_operations=0} {stream_backward_seek_operations=1}
{stream_seek_operations=1} {stream_bytes_read_in_close=8}
{stream_read_exceptions=0} }}
2020-08-12 12:19:48,584 [main] INFO org.apache.hudi.hive.HiveSyncTool -
Storage partitions scan complete. Found 1
2020-08-12 12:19:48,613 [main] INFO org.apache.hudi.hive.HiveSyncTool - New
Partitions []
2020-08-12 12:19:48,614 [main] INFO org.apache.hudi.hive.HoodieHiveClient -
No partitions to add for dmstest_multpk7
2020-08-12 12:19:48,614 [main] INFO org.apache.hudi.hive.HiveSyncTool -
Changed Partitions []
2020-08-12 12:19:48,614 [main] INFO org.apache.hudi.hive.HoodieHiveClient -
No partitions to change for dmstest_multpk7
2020-08-12 12:19:49,002 [main] INFO org.apache.hudi.hive.HiveSyncTool -
Sync complete for dmstest_multpk7
2020-08-12 12:19:49,031 [main] INFO
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer - Shut down
deltastreamer
2020-08-12 12:19:49,044 [main] INFO
org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend - Shutting down
all executors
```
```
aws s3 ls s3://redact/my2/multpk7/
PRE .hoodie/
PRE default/
aws s3 ls s3://redact/my2/multpk7/default/
2020-08-12 12:19:39 93 .hoodie_partition_metadata
2020-08-12 12:19:41 452644
a9ab6f7a-4def-490a-aac0-49e15ee9d742-0_0-25-15010_20200812121850.parquet
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]