yihua commented on code in PR #14115:
URL: https://github.com/apache/hudi/pull/14115#discussion_r2445625030
##########
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableVersion.java:
##########
@@ -78,7 +78,16 @@ public static HoodieTableVersion current() {
public static HoodieTableVersion fromVersionCode(int versionCode) {
return Arrays.stream(HoodieTableVersion.values())
.filter(v -> v.versionCode == versionCode).findAny()
- .orElseThrow(() -> new HoodieException("Unknown table versionCode:" +
versionCode));
+ .orElseThrow(() -> new HoodieException(
+ "Table version mismatch detected. "
+ + "The table was created with a Hudi version that has table
version code " + versionCode
+ + " but you're using an older version.\n"
+ + "Likely you are using a lower hudi binary version to read a
table written using higher hudi version which is not supported. "
+ + "To fix this issue:\n"
+ + " - Please upgrade your readers to use the same version as
writers\n"
+ + " - Table version code: " + versionCode + "\n"
+ + " - Current supported versions: " +
Arrays.toString(HoodieTableVersion.values()) + "\n"
+ + "See: https://hudi.apache.org/docs/migration_guide for more
information"));
Review Comment:
The error message does not match the actual error. The exception is thrown
if the table version is not in the range of 0 to 9 (inclusive). There is no
version comparison in this method.
##########
hudi-hadoop-common/src/main/java/org/apache/hudi/io/hadoop/HoodieBaseParquetWriter.java:
##########
@@ -146,8 +147,25 @@ public long getDataSize() {
}
public void write(R object) throws IOException {
- this.parquetWriter.write(object);
- writtenRecordCount.incrementAndGet();
+ try {
+ this.parquetWriter.write(object);
+ writtenRecordCount.incrementAndGet();
+ } catch (RuntimeException e) {
+ String errorMessage = e.getMessage() != null ? e.getMessage() : "";
+ if (isRequiredFieldNullError(errorMessage) &&
errorMessage.contains("_hoodie_is_deleted")) {
+ throw new HoodieException(
+ "'_hoodie_is_deleted' field is missing for some of the incoming
records.\n\n"
+ + "The table schema requires '_hoodie_is_deleted' to be
non-null, but some records lack this field.\n\n"
+ + "To fix:\n"
+ + " Ensure ALL records have '_hoodie_is_deleted' field set
(true/false)\n\n"
+ + "Original error: " + errorMessage, e);
Review Comment:
Avoid newline character in the error message?
##########
hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/DataSourceUtils.java:
##########
@@ -323,7 +323,27 @@ public boolean validate(long totalRecords, long
totalErroredRecords, Option<Hood
if (totalErroredRecords > 0) {
hasErrored.set(true);
ValidationUtils.checkArgument(writeStatusesOpt.isPresent(), "RDD
<WriteStatus> expected to be present when there are errors");
- LOG.error("{} failed with errors", writeOperationType);
+ List<String> errorKeys =
HoodieJavaRDD.getJavaRDD(writeStatusesOpt.get())
+ .filter(WriteStatus::hasErrors)
+ .flatMap(ws -> ws.getErrors().keySet().stream().iterator())
+ .take(10)
+ .stream()
+ .map(Object::toString)
+ .collect(Collectors.toList());
+
+ String errorSummary = String.format(
+ "%s operation failed with %d error(s).\n\nFailed records (first %d
of %d):\n%s\n\n"
+ + "Check for error stacktraces in the driver logs which could
give more information on the failure.",
+ writeOperationType,
+ totalErroredRecords,
+ Math.min(10, errorKeys.size()),
+ errorKeys.size(),
+ errorKeys.stream()
+ .map(k -> " - Record Key: " + k)
+ .collect(Collectors.joining("\n")));
+
+ LOG.error(errorSummary);
Review Comment:
We cannot log record keys which can contain PII, potentially violating data
compliance.
##########
hudi-hadoop-common/src/main/java/org/apache/hudi/io/hadoop/HoodieBaseParquetWriter.java:
##########
@@ -146,8 +147,25 @@ public long getDataSize() {
}
public void write(R object) throws IOException {
- this.parquetWriter.write(object);
- writtenRecordCount.incrementAndGet();
+ try {
+ this.parquetWriter.write(object);
+ writtenRecordCount.incrementAndGet();
+ } catch (RuntimeException e) {
+ String errorMessage = e.getMessage() != null ? e.getMessage() : "";
+ if (isRequiredFieldNullError(errorMessage) &&
errorMessage.contains("_hoodie_is_deleted")) {
Review Comment:
Where does these exceptions come from? Could `errorMessage.contains` be
avoided and the source to throw an exception of specific classes which can be
caught here?
##########
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/IMetaStoreClientUtil.java:
##########
@@ -39,8 +40,25 @@ public static IMetaStoreClient getMSC(HiveConf hiveConf)
throws HiveException, M
try {
metaStoreClient = ((Hive) Hive.class.getMethod("getWithoutRegisterFns",
HiveConf.class).invoke(null, hiveConf)).getMSC();
} catch (NoSuchMethodException | IllegalAccessException |
IllegalArgumentException
- | InvocationTargetException ex) {
- metaStoreClient = Hive.get(hiveConf).getMSC();
+ | InvocationTargetException ex) {
+ try {
+ metaStoreClient = Hive.get(hiveConf).getMSC();
+ } catch (RuntimeException e) {
+ if (e.getMessage() != null && e.getMessage().contains("not
org.apache.hudi.org.apache.hadoop")) {
+ throw new HoodieException(
+ "Hive Metastore compatibility issue detected. This usually
happens due to:\n"
+ + " 1. Hive version mismatch\n"
+ + " 2. Conflicting Hive libraries in classpath\n"
+ + " 3. Incompatible hudi-spark-bundle version\n\n"
+ + "To resolve:\n"
+ + " - For Hive 2.x: Use hudi-spark-bundle with 'hive2'
classifier\n"
+ + " - For Hive 3.x: Use hudi-spark-bundle with 'hive3'
classifier\n"
Review Comment:
What does this mean?
##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncClient.java:
##########
@@ -121,10 +121,25 @@ public MessageType getStorageSchema(boolean
includeMetadataField) {
try {
return tableSchemaResolver.getTableParquetSchema(includeMetadataField);
} catch (Exception e) {
- throw new HoodieSyncException("Failed to read schema from storage.", e);
+ throw new HoodieSyncException(buildSchemaReadErrorMessage(e), e);
}
}
+ private String buildSchemaReadErrorMessage(Exception e) {
+ String errorMessage = e.getMessage() != null ? e.getMessage() :
e.getClass().getName();
+ if (e instanceof java.io.FileNotFoundException) {
+ return "Cannot read Hudi table schema - required data file is
missing.\n\n"
+ + "This can happen due to:\n"
+ + " 1. Aggressive cleaner retention compared to query run times\n"
+ + " 2. Manual file deletions (timeline files or data files)\n"
+ + " 3. Concurrent writers without proper locking or configurations
set\n\n"
+ + "Depending on the root cause, mitigation might differ.\n\n"
+ + "Original error: " + errorMessage;
Review Comment:
The error message does not make sense. The main cause is that the commit
metadata does not have the table schema and there is no data file to check the
schema either.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]