[ 
https://issues.apache.org/jira/browse/SPARK-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230878#comment-14230878
 ] 

Shivaram Venkataraman commented on SPARK-3963:
----------------------------------------------

[~pwendell] This looks pretty useful -- Was this postponed from 1.2 ? I have a 
use case that needs Hadoop file names and was wondering if there was a 
workaround before this is implemented.

> Support getting task-scoped properties from TaskContext
> -------------------------------------------------------
>
>                 Key: SPARK-3963
>                 URL: https://issues.apache.org/jira/browse/SPARK-3963
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>            Reporter: Patrick Wendell
>
> This is a proposal for a minor feature. Given stabilization of the 
> TaskContext API, it would be nice to have a mechanism for Spark jobs to 
> access properties that are defined based on task-level scope by Spark RDD's. 
> I'd like to propose adding a simple properties hash map with some standard 
> spark properties that users can access. Later it would be nice to support 
> users setting these properties, but for now to keep it simple in 1.2. I'd 
> prefer users not be able to set them.
> The main use case is providing the file name from Hadoop RDD's, a very common 
> request. But I'd imagine us using this for other things later on. We could 
> also use this to expose some of the taskMetrics, such as e.g. the input bytes.
> {code}
> val data = sc.textFile("s3n//..2014/*/*/*.json")
> data.mapPartitions { 
>   val tc = TaskContext.get
>   val filename = tc.getProperty(TaskContext.HADOOP_FILE_NAME)
>   val parts = fileName.split("/")
>   val (year, month, day) = (parts[3], parts[4], parts[5])
>   ...
> }
> {code}
> Internally we'd have a method called setProperty, but this wouldn't be 
> exposed initially. This is structured as a simple (String, String) hash map 
> for ease of porting to python.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to