[
https://issues.apache.org/jira/browse/HUDI-5541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luning Wang updated HUDI-5541:
------------------------------
Description:
When I run a bootstrap to convert a hive table to Hudi in the 0.12.2 version,
it throws the following error. This table is `call_center` in the TPC-DS
standard. It hasn’t `ts` field.
{code:java}
org.apache.hudi.exception.HoodieInsertException: Failed to bulk insert for
commit time 00000000000002
...
Caused by: org.apache.hudi.exception.HoodieException: ts(Part -ts) field not
found in record. Acceptable fields were :[cc_call_center_sk...
cc_tax_percentage]
at
org.apache.hudi.avro.HoodieAvroUtils.getNestedFieldVal(HoodieAvroUtils.java:542)
at
org.apache.hudi.avro.HoodieAvroUtils.getNestedFieldValAsString(HoodieAvroUtils.java:520)
at
org.apache.hudi.bootstrap.SparkFullBootstrapDataProviderBase.lambda$generateInputRecords$5ff1ef2f$1(SparkFullBootstrapDataProviderBase.java:73)
... {code}
The following is my bootstrap command. I can't disable precombine by setting
specific options.
{code:java}
bin/spark-submit --master yarn \
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
--class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer
/opt/hudi-utilities-bundle_2.12-0.12.2.jar \
--run-bootstrap \
--target-base-path /tpcds_hudi_3.db/call_center \
--target-table call_center \
--table-type COPY_ON_WRITE \
--hoodie-conf
hoodie.bootstrap.base.path=/tpcds_bin_partitioned_parquet_3.db/call_center \
--hoodie-conf
hoodie.bootstrap.keygen.class=org.apache.hudi.keygen.NonpartitionedKeyGenerator
\
--hoodie-conf hoodie.datasource.write.recordkey.field=cc_call_center_sk \
--hoodie-conf
hoodie.bootstrap.full.input.provider=org.apache.hudi.bootstrap.SparkParquetBootstrapDataProvider
\
--hoodie-conf
hoodie.bootstrap.mode.selector=org.apache.hudi.client.bootstrap.selector.BootstrapRegexModeSelector
\
--hoodie-conf hoodie.bootstrap.mode.selector.regex.mode=FULL_RECORD \
--hoodie-conf
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.NonpartitionedKeyGenerator
{code}
was:
When I run a bootstrap to convert a hive table to Hudi in the 0.12.2 version,
it throws the following error. This table is `{{{}call_center` {}}}in the
TPC-DS standard. It hasn’t {{`ts` }}field.
{code:java}
org.apache.hudi.exception.HoodieInsertException: Failed to bulk insert for
commit time 00000000000002
...
Caused by: org.apache.hudi.exception.HoodieException: ts(Part -ts) field not
found in record. Acceptable fields were :[cc_call_center_sk...
cc_tax_percentage]
at
org.apache.hudi.avro.HoodieAvroUtils.getNestedFieldVal(HoodieAvroUtils.java:542)
at
org.apache.hudi.avro.HoodieAvroUtils.getNestedFieldValAsString(HoodieAvroUtils.java:520)
at
org.apache.hudi.bootstrap.SparkFullBootstrapDataProviderBase.lambda$generateInputRecords$5ff1ef2f$1(SparkFullBootstrapDataProviderBase.java:73)
... {code}
The following is my bootstrap command. I can't disable precombine by setting
specific options.
{code:java}
bin/spark-submit --master yarn \
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
--class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer
/opt/hudi-utilities-bundle_2.12-0.12.2.jar \
--run-bootstrap \
--target-base-path /tpcds_hudi_3.db/call_center \
--target-table call_center \
--table-type COPY_ON_WRITE \
--hoodie-conf
hoodie.bootstrap.base.path=/tpcds_bin_partitioned_parquet_3.db/call_center \
--hoodie-conf
hoodie.bootstrap.keygen.class=org.apache.hudi.keygen.NonpartitionedKeyGenerator
\
--hoodie-conf hoodie.datasource.write.recordkey.field=cc_call_center_sk \
--hoodie-conf
hoodie.bootstrap.full.input.provider=org.apache.hudi.bootstrap.SparkParquetBootstrapDataProvider
\
--hoodie-conf
hoodie.bootstrap.mode.selector=org.apache.hudi.client.bootstrap.selector.BootstrapRegexModeSelector
\
--hoodie-conf hoodie.bootstrap.mode.selector.regex.mode=FULL_RECORD \
--hoodie-conf
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.NonpartitionedKeyGenerator
{code}
> Disable precombine in bootstrap
> -------------------------------
>
> Key: HUDI-5541
> URL: https://issues.apache.org/jira/browse/HUDI-5541
> Project: Apache Hudi
> Issue Type: Bug
> Components: bootstrap, hudi-utilities
> Reporter: Luning Wang
> Priority: Major
>
> When I run a bootstrap to convert a hive table to Hudi in the 0.12.2 version,
> it throws the following error. This table is `call_center` in the TPC-DS
> standard. It hasn’t `ts` field.
>
> {code:java}
> org.apache.hudi.exception.HoodieInsertException: Failed to bulk insert for
> commit time 00000000000002
> ...
> Caused by: org.apache.hudi.exception.HoodieException: ts(Part -ts) field not
> found in record. Acceptable fields were :[cc_call_center_sk...
> cc_tax_percentage]
> at
> org.apache.hudi.avro.HoodieAvroUtils.getNestedFieldVal(HoodieAvroUtils.java:542)
> at
> org.apache.hudi.avro.HoodieAvroUtils.getNestedFieldValAsString(HoodieAvroUtils.java:520)
> at
> org.apache.hudi.bootstrap.SparkFullBootstrapDataProviderBase.lambda$generateInputRecords$5ff1ef2f$1(SparkFullBootstrapDataProviderBase.java:73)
> ... {code}
>
> The following is my bootstrap command. I can't disable precombine by setting
> specific options.
> {code:java}
> bin/spark-submit --master yarn \
> --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
> --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer
> /opt/hudi-utilities-bundle_2.12-0.12.2.jar \
> --run-bootstrap \
> --target-base-path /tpcds_hudi_3.db/call_center \
> --target-table call_center \
> --table-type COPY_ON_WRITE \
> --hoodie-conf
> hoodie.bootstrap.base.path=/tpcds_bin_partitioned_parquet_3.db/call_center \
> --hoodie-conf
> hoodie.bootstrap.keygen.class=org.apache.hudi.keygen.NonpartitionedKeyGenerator
> \
> --hoodie-conf hoodie.datasource.write.recordkey.field=cc_call_center_sk \
> --hoodie-conf
> hoodie.bootstrap.full.input.provider=org.apache.hudi.bootstrap.SparkParquetBootstrapDataProvider
> \
> --hoodie-conf
> hoodie.bootstrap.mode.selector=org.apache.hudi.client.bootstrap.selector.BootstrapRegexModeSelector
> \
> --hoodie-conf hoodie.bootstrap.mode.selector.regex.mode=FULL_RECORD \
> --hoodie-conf
> hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.NonpartitionedKeyGenerator
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)