[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14592564#comment-14592564
 ] 

Jason Lowe commented on MAPREDUCE-6376:
---------------------------------------

Sorry for the delay getting to this.

bq. I couldn't get access to a machine to generate more than 50k mapper jobs

It shouldn't be difficult to fake a .jhist file for such a scale with a single 
machine.  Just instantiate an EventWriter and send it 
TaskStarted/TaskAttemptStarted/MapAttemptFinished/ReduceAttemptFinished/TaskFinished
 events for as many tasks as desired.  Could use a "real" jhist file as a 
template for the fake events so it has similar sizing for hostnames, counters, 
etc.  Would be good to get some real performance numbers for sizeable jobs 
rather than guesses as to what it will do in practice.

Comments on the latest patch:

{code}
+    if (jhistFormat.equals("json")) {
+      jhistMode = EventWriter.WriteMode.JSON;
+    } else if (jhistFormat.equals("binary")) {
+      jhistMode = EventWriter.WriteMode.BINARY;
+    } else {
+      jhistMode = EventWriter.WriteMode.JSON;
+    }
{code}
This can be simplified to:
{code}
  jhistMode = jhistFormat.equals("binary") ? EventWriter.WriteMode.BINARY : 
EventWriter.WriteMode.JSON;
{code}
but arguably it shouldn't.  I'm not a fan of assuming we know the user's 
intentions when they explicitly overrode the default value to something we 
don't recognize.  IMO if they bothered to override the default to a value we 
don't recognize then we should throw an error.  Case in point: we add a third 
format later.  A user applies that new config with old code.  It will silently 
try to parse json and blow up in cryptic ways rather than tell them in a 
straightforward error message that we don't know how to do what they're asking 
and what values we do recognize.

EventReader should not assume json if it doesn't recognize the version after 
parsing the schema.  We just verified it can only be json or binary, so 
something's wrong if we get to the else.  We should throw here or just assume 
if it isn't one it must be the other since we just verified it a few lines 
above.  For the latter approach, bonus points for caching the comparison result 
in a boolean so we don't have to do it all over again.

Similar situation for the EventWriter constructor and EventWriter.write method. 
 If we don't recognize the WriteMode value then something is very, very wrong.  
I think we can just validate the WriteMode value in the constructor and then 
simplify all the other checks by just checking for one and assuming the other, 
since we already validated in the constructor there's only two possible inputs.

The writeMode member should be final, as I can't think of any valid reason to 
allow changing it after the writer is constructed.

The following code:
{code}
    if (!EventWriter.VERSION.equals(version) &&
        !EventWriter.VERSION_BINARY.equals(version)) {
       throw new IOException("Incompatible event log version: "+version);
    }
    [...]
    if (EventWriter.VERSION.equals(version)) {
      this.decoder = DecoderFactory.get().jsonDecoder(schema, in);
    } else if (EventWriter.VERSION_BINARY.equals(version)) {
      this.decoder = DecoderFactory.get().binaryDecoder(in, null);
    } else {
      this.decoder = DecoderFactory.get().jsonDecoder(schema, in);
    }
{code}
should be simplified to:
{code}
    if (EventWriter.VERSION.equals(version)) {
      this.decoder = DecoderFactory.get().jsonDecoder(schema, in);
    } else if (EventWriter.VERSION_BINARY.equals(version)) {
      this.decoder = DecoderFactory.get().binaryDecoder(in, null);
    } else {
       throw new IOException("Incompatible event log version: "+version);
    }
{code}

Nit: "File format generated by the .jhist file generated by the AM" has a bit 
too much generating.  Maybe "File format the AM will use when generating the 
.jhist file."?

> Fix long load times of .jhist file in JobHistoryServer
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-6376
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6376
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: jobhistoryserver
>    Affects Versions: 2.7.0
>            Reporter: Ray Chiang
>            Assignee: Ray Chiang
>              Labels: supportability
>         Attachments: MAPREDUCE-6376.001.patch, MAPREDUCE-6376.002.patch, 
> MAPREDUCE-6376.003.patch
>
>
> When you click on a Job link in the JHS Web UI, it loads the .jhist file.  
> For jobs which have a large number of tasks, the load time can break UI 
> responsiveness.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to