[ https://issues.apache.org/jira/browse/MAPREDUCE-7184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766054#comment-16766054 ]
Steve Loughran commented on MAPREDUCE-7184: ------------------------------------------- * If we look at changes here, the openFile() command set things up to use a CompletableFuture<> for the opening, which, by default, is actually evaluated in the same thread as the caller (i.e. its a blocking operation) * But if the counter is not the same, it means that the getRawFilesystem.open("file.crc").readFully() isn't incrementing the thread local stats, which implies that it is somehow running in a different thread Will that have adverse consequences? No, but it is a difference in behaviour, and that could be considered a regression. And I don't understand why it is happening, given that the open call (see {{FIleSystem.openFileWithOptions()}} is opened in the same thread as normal. Thoughts * Although the s3 select stuff through the MR pipeline going to have to go in later (MAPREDUCE-7182), I'd like to keep the openfile() code in as is because it lets us add custom options to files opened (specifically, I want to add an option to allow the seek format of a file to be declared). But: we could pull those changes in the MR code as is, with a goal of MAPREDUCE-7182 to add that stuff. including tests comparing the byte count options? Or: I can do something isolated just for here? > TestJobCounters#getFileSize can ignore crc file > ----------------------------------------------- > > Key: MAPREDUCE-7184 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7184 > Project: Hadoop Map/Reduce > Issue Type: Bug > Reporter: Prabhu Joseph > Assignee: Prabhu Joseph > Priority: Major > Attachments: MAPREDUCE-7184-001.patch, MAPREDUCE-7184-002.patch, > MAPREDUCE-7184-003.patch > > > TestJobCounters test cases are failing in trunk while validating the input > files size with BYTES_READ by the job. The crc files are considered in > getFileSize whereas the job FileInputFormat ignores them. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org