[
https://issues.apache.org/jira/browse/SPARK-10897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14940157#comment-14940157
]
Nithin Asokan commented on SPARK-10897:
---------------------------------------
{quote}
For example if groupBy results in 3 stages, which one gets the name? if 3
method calls result in 1 stage?I don't think it's impossible but not sure about
the details of the semantics.
{quote}
This is a good point. I did not think of this scenario in my mind.
{quote}
is the motivation really to just display something farther up the call stack?
{quote}
Yes, Crunch has a concept of DoFn which is similar to Function in spark. These
DoFn's can take names that are usually displayed on a Job page in MR. I should
not be comparing MR to Spark, but in my use case; we are migrating from MR to
Spark. And our engineers are familiar with how crunch creates a MR job that has
a nice job name which includes all DoFn name; this give more context to a user
as what the job is processing. For example: In MR crunch can create a job name
like this {{MyPipeline: Text("/input/path")+Filter valid
lines+Text("/output/path")}}. In case of Spark, we are missing that
information. I believe partly because Spark scheduler handles stage and job
creation. A Spark job/stage name may appear as
{code}
sortByKey at PGroupedTableImpl.java:123 (job name)
mapToPair at PGroupedTableImpl.java:108 (stage name)
{code}
While this gives idea that it's processing/creating a PGroupedTable, it does
not give me full context(atleast through Crunch) of DoFn applied. If Spark
allows users to set Stage names, I think we can pass some DoFn information from
Crunch. The next thing I would ask myself would be, if Crunch does not know
what stages are created, how can it know which DoFn name to pass to Spark?
I'm not fully sure if this can be supported because of my less knowledge in
Spark, but if other feels it's possible it could be something that will be
helpful for Crunch.
> Custom job/stage names
> ----------------------
>
> Key: SPARK-10897
> URL: https://issues.apache.org/jira/browse/SPARK-10897
> Project: Spark
> Issue Type: Wish
> Components: Web UI
> Reporter: Nithin Asokan
> Priority: Minor
>
> Logging this jira to get some opinion about discussion I started on
> [user-list|http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Job-Stage-names-tt24867.html]
> I would like to get some thoughts about having custom stage/job names.
> Currently I believe the stage names cannot be controlled by user, but if
> allowed we can have libraries like Apache [Crunch|https://crunch.apache.org/]
> to dynamically set stage names based on the type of
> processing(action/transformation) it is performing.
> Is it possible for Spark to support custom names? Will it make sense to allow
> users set stage names?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]