rmahindra123 commented on code in PR #9913:
URL: https://github.com/apache/hudi/pull/9913#discussion_r1382227013
##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/StreamSync.java:
##########
@@ -541,29 +579,37 @@ private Pair<SchemaProvider, Pair<String,
JavaRDD<HoodieRecord>>> fetchFromSourc
checkpointStr = dataAndCheckpoint.getCheckpointForNextBatch();
boolean reconcileSchema =
props.getBoolean(DataSourceWriteOptions.RECONCILE_SCHEMA().key());
if (this.userProvidedSchemaProvider != null &&
this.userProvidedSchemaProvider.getTargetSchema() != null) {
- // If the target schema is specified through Avro schema,
- // pass in the schema for the Row-to-Avro conversion
- // to avoid nullability mismatch between Avro schema and Row schema
- if (errorTableWriter.isPresent()
- &&
props.getBoolean(HoodieErrorTableConfig.ERROR_ENABLE_VALIDATE_TARGET_SCHEMA.key(),
-
HoodieErrorTableConfig.ERROR_ENABLE_VALIDATE_TARGET_SCHEMA.defaultValue())) {
- // If the above conditions are met, trigger error events for the
rows whose conversion to
- // avro records fails.
- avroRDDOptional = transformed.map(
- rowDataset -> {
- Tuple2<RDD<GenericRecord>, RDD<String>> safeCreateRDDs =
HoodieSparkUtils.safeCreateRDD(rowDataset,
- HOODIE_RECORD_STRUCT_NAME, HOODIE_RECORD_NAMESPACE,
reconcileSchema,
-
Option.of(this.userProvidedSchemaProvider.getTargetSchema()));
-
errorTableWriter.get().addErrorEvents(safeCreateRDDs._2().toJavaRDD()
- .map(evStr -> new ErrorEvent<>(evStr,
- ErrorEvent.ErrorReason.AVRO_DESERIALIZATION_FAILURE)));
- return safeCreateRDDs._1.toJavaRDD();
- });
+ if (useRowWriter) {
+ if (errorTableWriter.isPresent()) {
+ throw new HoodieException("Error table is not yet supported with
row writer");
Review Comment:
why though, since we are not converting to AVRO, we don't need to check for
AVRO deser errors. But why cannot error Table still be enabled for bad records
in source and transformers.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]