pvary commented on pull request #1407: URL: https://github.com/apache/iceberg/pull/1407#issuecomment-687074236
> > Relies on listing of the target dir. > > Can we find out in job commit how many writer tasks there were? Then we could use well-known locations and make sure each one is read. I suspect that the JobContext contains only input information about the number of mappers/reducers. I have only debugged the LocalJobRunner code for now, but I did not see anything which would indicate that we have up-to-date information there. The only solution for it I was able to come up is creating a new JobClient to get the info from the server. I was not able to make it work for the LocalJobRunner yet, and I think this would be too specific for MR. How does this work for Spark writes? Do we have any other places where MR write is already implemented for Iceberg? Updated the PR to commit the task only at IcebergOutputCommitter.commitTask, and not at IcebergRecordWriter.close. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
