fengjian428 opened a new issue #3044:
URL: https://github.com/apache/hudi/issues/3044
run bootstrap with following command:
`spark-submit --master yarn --deploy-mode cluster --queue nonlive --conf
spark.yarn.maxAppAttempts=1 \
--driver-memory 20g --driver-cores 2 --executor-memory 15g
--executor-cores 2 \
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
\
--conf
'spark.driver.extraJavaOptions=-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/tmp/hoodie-heapdump.hprof -XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCApplicationConcurrentTime -XX:+PrintTenuringDistribution
-XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+PrintGCDateStamps
-XX:+PrintAdaptiveSizePolicy -XX:+PrintFlagsFinal -XX:+PrintReferenceGC' \
--conf 'spark.executor.extraJavaOptions=-XX:NewSize=1g
-XX:SurvivorRatio=2 -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC
-XX:+UseParNewGC -XX:CMSInitiatingOccupancyFraction=70 -verbose:gc
-XX:+UnlockDiagnosticVMOptions -XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/tmp/hoodie-heapdump.hprof
-XX:HeapDumpPath=/tmp/hoodie-heapdump.hprof -XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCApplicationConcurrentTime -XX:+PrintTenuringDistribution
-XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+PrintGCDateStamps
-XX:+PrintAdaptiveSizePolicy -XX:+PrintFlagsFinal -XX:+PrintReferenceGC' \
--packages org.apache.hudi:hudi-spark-bundle_2.11:0.8.0 \
--class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer
hudi-utilities-bundle_2.11-0.8.0.jar \
--table-type MERGE_ON_READ \
--run-bootstrap \
--target-base-path
/projects/data_vite/mysql_ingestion/rti_vite/shopee_item_v4_bootdb__item_v4_tab
\
--target-table shopee_item_v4_bootdb__item_v4_tab \
--hoodie-conf
hoodie.bootstrap.base.path=/projects/data_vite/mysql_ingestion/rti_vite/shopee_item_v4_db__item_v4_tab
\
--hoodie-conf hoodie.datasource.write.recordkey.field=itemid \
--source-class org.apache.hudi.utilities.sources.JsonDFSSource \
--source-ordering-field _event.ts \
--schemaprovider-class
org.apache.hudi.utilities.schema.FilebasedSchemaProvider \
--hoodie-conf
hoodie.deltastreamer.schemaprovider.source.schema.file=/tmp/config/source.avsc \
--hoodie-conf
hoodie.deltastreamer.schemaprovider.target.schema.file=/tmp/config/target.avsc \
--initial-checkpoint-provider
org.apache.hudi.utilities.checkpointing.InitialCheckpointFromAnotherHoodieTimelineProvider
\
--checkpoint
/projects/data_vite/mysql_ingestion/rti_vite/shopee_item_v4_db__item_v4_tab/ \
--transformer-class
org.apache.hudi.utilities.transform.SqlQueryBasedTransformer --hoodie-conf
hoodie.deltastreamer.transformer.sql="Select
*,cast(from_unixtime(_event.ts,'YYYY-MM-dd-HH') as string) grass_date from
<SRC>" \
--hoodie-conf
hoodie.datasource.write.partitionpath.field=grass_region,grass_date \
--hoodie-conf
hoodie.bootstrap.keygen.class=org.apache.hudi.keygen.ComplexKeyGenerator \
--hoodie-conf
hoodie.bootstrap.full.input.provider=org.apache.hudi.bootstrap.SparkParquetBootstrapDataProvider
\
--hoodie-conf
hoodie.bootstrap.mode.selector=org.apache.hudi.client.bootstrap.selector.BootstrapRegexModeSelector
\
--hoodie-conf hoodie.bootstrap.mode.selector.regex.mode=FULL_RECORD
`
got error below:
21/06/07 17:22:49 INFO Client:
client token: N/A
diagnostics: User class threw exception:
org.apache.hudi.exception.TableNotFoundException: Hoodie table not found in
path
/projects/data_vite/mysql_ingestion/rti_vite/shopee_item_v4_bootdb__item_v4_tab/.hoodie
at
org.apache.hudi.exception.TableNotFoundException.checkTableValidity(TableNotFoundException.java:53)
at
org.apache.hudi.common.table.HoodieTableMetaClient.<init>(HoodieTableMetaClient.java:110)
at
org.apache.hudi.common.table.HoodieTableMetaClient.<init>(HoodieTableMetaClient.java:71)
at
org.apache.hudi.common.table.HoodieTableMetaClient$Builder.build(HoodieTableMetaClient.java:581)
at
org.apache.hudi.common.table.HoodieTableMetaClient.initTableAndGetMetaClient(HoodieTableMetaClient.java:357)
at
org.apache.hudi.common.table.HoodieTableMetaClient$PropertyBuilder.initTable(HoodieTableMetaClient.java:752)
at
org.apache.hudi.utilities.deltastreamer.BootstrapExecutor.initializeTable(BootstrapExecutor.java:180)
at
org.apache.hudi.utilities.deltastreamer.BootstrapExecutor.execute(BootstrapExecutor.java:139)
at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:153)
at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:470)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:685)
but I check hdfs path
/projects/data_vite/mysql_ingestion/rti_vite/shopee_item_v4_bootdb__item_v4_tab/.hoodie
is exists
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]