Lucas61000 commented on code in PR #5983:
URL: https://github.com/apache/hive/pull/5983#discussion_r2215120041
##########
shims/common/src/main/java/org/apache/hadoop/hive/shims/HadoopShimsSecure.java:
##########
@@ -249,26 +270,49 @@ protected boolean initNextRecordReader(K key) throws
IOException {
return false;
}
- // get a record reader for the idx-th chunk
- try {
- curReader = rrConstructor.newInstance(new Object[]
- {split, jc, reporter, Integer.valueOf(idx), preReader});
-
- // change the key if need be
- if (key != null) {
- K newKey = curReader.createKey();
- ((CombineHiveKey)key).setKey(newKey);
+ if (skipCorruptfile) {
+ // get a record reader for the idx-th chunk
+ try {
+ curReader = rrConstructor.newInstance(new Object[]
+ {split, jc, reporter, Integer.valueOf(idx), preReader});
+
+ // change the key if need be
+ if (key != null) {
+ K newKey = curReader.createKey();
+ ((CombineHiveKey) key).setKey(newKey);
+ }
+
+ // setup some helper config variables.
+ jc.set("map.input.file", split.getPath(idx).toString());
+ jc.setLong("map.input.start", split.getOffset(idx));
+ jc.setLong("map.input.length", split.getLength(idx));
+ } catch (InvocationTargetException ITe) {
+ return false;
+ } catch (Exception e) {
+ curReader =
HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(e, jc);
Review Comment:
Thanks for the review. The issue I’m facing is that some users don’t care
about data integrity—they just want their jobs to run smoothly and hope to skip
files corrupted during transmission.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]