[
https://issues.apache.org/jira/browse/SPARK-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon updated SPARK-3963:
--------------------------------
Labels: bulk-closed (was: )
> Support getting task-scoped properties from TaskContext
> -------------------------------------------------------
>
> Key: SPARK-3963
> URL: https://issues.apache.org/jira/browse/SPARK-3963
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Reporter: Patrick Wendell
> Priority: Major
> Labels: bulk-closed
>
> This is a proposal for a minor feature. Given stabilization of the
> TaskContext API, it would be nice to have a mechanism for Spark jobs to
> access properties that are defined based on task-level scope by Spark RDD's.
> I'd like to propose adding a simple properties hash map with some standard
> spark properties that users can access. Later it would be nice to support
> users setting these properties, but for now to keep it simple in 1.2. I'd
> prefer users not be able to set them.
> The main use case is providing the file name from Hadoop RDD's, a very common
> request. But I'd imagine us using this for other things later on. We could
> also use this to expose some of the taskMetrics, such as e.g. the input bytes.
> {code}
> val data = sc.textFile("s3n//..2014/*/*/*.json")
> data.mapPartitions {
> val tc = TaskContext.get
> val filename = tc.getProperty(TaskContext.HADOOP_FILE_NAME)
> val parts = fileName.split("/")
> val (year, month, day) = (parts[3], parts[4], parts[5])
> ...
> }
> {code}
> Internally we'd have a method called setProperty, but this wouldn't be
> exposed initially. This is structured as a simple (String, String) hash map
> for ease of porting to python.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]