[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13463833#comment-13463833
 ] 

Harsh J commented on MAPREDUCE-4464:
------------------------------------

Hi Clint,

Sorry on the delay here!

I noticed that the line:

bq. String host = u.getHost();

Which is the one in question of carrying a null, is then used in the lookup as:

bq. List<MapOutputLocation> loc = mapLocations.get(host);

Hence, I think the most ideal fix would be to throw an exception. Because, in 
the chunks later, we rely heavily on host:

{code}
              URI u = URI.create(event.getTaskTrackerHttp());
              String host = u.getHost();
              TaskAttemptID taskId = event.getTaskAttemptId();
              URL mapOutputLocation = new URL(event.getTaskTrackerHttp() + 
                                      "/mapOutput?job=" + taskId.getJobID() +
                                      "&map=" + taskId + 
                                      "&reduce=" + getPartition());
              List<MapOutputLocation> loc = mapLocations.get(host);
              if (loc == null) {
                loc = Collections.synchronizedList
                  (new LinkedList<MapOutputLocation>());
                mapLocations.put(host, loc);
               }
              loc.add(new MapOutputLocation(taskId, host, mapOutputLocation));
              numNewMaps ++;
{code}

As seen by its usage, if host itself is undeterminable, and is consistently 
null, we cannot really work with it, and throwing an IOException makes sense.

I'm currently running test-patch on your patch for branch-1, depending on whose 
results I'll commit it in or post some further comments.

MR2 may be similarly affected on the netty side but may be failing properly 
already, I haven't the time to verify at the moment (perhaps another JIRA). So 
I'll just focus on the MR1 side now.

Thanks for the patch!
                
> Reduce tasks failing with NullPointerException in ConcurrentHashMap.get()
> -------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4464
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4464
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: task
>    Affects Versions: 1.0.0
>            Reporter: Clint Heath
>            Assignee: Clint Heath
>            Priority: Minor
>         Attachments: MAPREDUCE-4464_new.patch, MAPREDUCE-4464.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> If DNS does not resolve hostnames properly, reduce tasks can fail with a very 
> misleading exception.
> as per my peer Ahmed's diagnosis:
> In ReduceTask, it seems that event.getTaskTrackerHttp() returns a malformed 
> URI, and so host from:
> {code}
> String host = u.getHost();
> {code}
> is evaluated to null and the NullPointerException is thrown afterwards in the 
> ConcurrentHashMap.
> I have written a patch to check for a null hostname condition when getHost is 
> called in the getMapCompletionEvents method and print an intelligible warning 
> message rather than suppressing it until later when it becomes confusing and 
> misleading.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to