homatthew commented on code in PR #3818:
URL: https://github.com/apache/gobblin/pull/3818#discussion_r1380640152
##########
gobblin-modules/gobblin-orc/src/main/java/org/apache/gobblin/writer/GobblinBaseOrcWriter.java:
##########
@@ -258,7 +261,18 @@ public void close()
public void commit()
throws IOException {
closeInternal();
+ if(this.validateORCDuringCommit) {
Review Comment:
To be clear, the current issues we see are from writing a bad orc file and
then moving it to the taskoutput directory where the file is effectively
committed.
We do NOT want to modify the behavior of the base data publisher because its
such a widely used class with very wide implications. But the current behavior
of the base data publisher is to read all the files in the output dir and use
runners to move them all in parallel. It has nothing to do with who originally
wrote the file, it will blindly move all of them at that point.
The base data publisher is not a good place to do validation either because
it does not care about the data being moved, it's agnostic to data formats.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]