You can provide your own log directory, where Spark log will be saved, and
that you could replay afterwards.

Set in your job this: `spark.eventLog.dir=s3://bucket/some/directory` and
run it.
Note! The path `s3://bucket/some/directory` must exist before you run your
job, it'll not be created automatically.

The Spark HistoryServer on EMR won't show you anything because it's looking
for logs in `hdfs:///var/log/spark/apps` by default.

After that you can either copy the log files from s3 to the hdfs path
above, or you can copy them locally to `/tmp/spark-events` (the default
directory for spark logs) and run the history server like:
```
cd /usr/local/src/spark-1.6.1-bin-hadoop2.6
sbin/start-history-server.sh
```
and then open http://localhost:18080




On Thu, Mar 30, 2017 at 8:45 PM, Paul Tremblay <paulhtremb...@gmail.com>
wrote:

> I am looking for tips on evaluating my Spark job after it has run.
>
> I know that right now I can look at the history of jobs through the web
> ui. I also know how to look at the current resources being used by a
> similar web ui.
>
> However, I would like to look at the logs after the job is finished to
> evaluate such things as how many tasks were completed, how many executors
> were used, etc. I currently save my logs to S3.
>
> Thanks!
>
> Henry
>
> --
> Paul Henry Tremblay
> Robert Half Technology
>

Reply via email to