[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766054#comment-16766054
 ] 

Steve Loughran commented on MAPREDUCE-7184:
-------------------------------------------

* If we look at changes here, the openFile() command set things up to use a 
CompletableFuture<> for the opening, which, by default, is actually evaluated 
in the same thread as the caller (i.e. its a blocking operation)
* But if the counter is not the same, it means that the 
getRawFilesystem.open("file.crc").readFully() isn't incrementing the thread 
local stats, which implies that it is somehow running in a different thread

Will that have adverse consequences? No, but it is a difference in behaviour, 
and that could be considered a regression. And I don't understand why it is 
happening, given that the open call (see {{FIleSystem.openFileWithOptions()}} 
is opened in the same thread as normal.

Thoughts
* Although the s3 select stuff through the MR pipeline going to have to go in 
later (MAPREDUCE-7182), I'd like to keep the openfile() code in as is because 
it lets us add custom options to files opened (specifically, I want to add an 
option to allow the seek format of a file to be declared). 

But: we could pull those changes in the MR code as is, with a goal of 
MAPREDUCE-7182 to add that stuff. including tests comparing the byte count 
options? Or: I can do something isolated just for here?

> TestJobCounters#getFileSize can ignore crc file
> -----------------------------------------------
>
>                 Key: MAPREDUCE-7184
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7184
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Prabhu Joseph
>            Assignee: Prabhu Joseph
>            Priority: Major
>         Attachments: MAPREDUCE-7184-001.patch, MAPREDUCE-7184-002.patch, 
> MAPREDUCE-7184-003.patch
>
>
> TestJobCounters test cases are failing in trunk while validating the input 
> files size with BYTES_READ by the job. The crc files are considered in 
> getFileSize whereas the job FileInputFormat ignores them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

Reply via email to