[ https://issues.apache.org/jira/browse/SPARK-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Patrick Wendell updated SPARK-3963: ----------------------------------- Target Version/s: 1.3.0 (was: 1.2.0) > Support getting task-scoped properties from TaskContext > ------------------------------------------------------- > > Key: SPARK-3963 > URL: https://issues.apache.org/jira/browse/SPARK-3963 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Reporter: Patrick Wendell > > This is a proposal for a minor feature. Given stabilization of the > TaskContext API, it would be nice to have a mechanism for Spark jobs to > access properties that are defined based on task-level scope by Spark RDD's. > I'd like to propose adding a simple properties hash map with some standard > spark properties that users can access. Later it would be nice to support > users setting these properties, but for now to keep it simple in 1.2. I'd > prefer users not be able to set them. > The main use case is providing the file name from Hadoop RDD's, a very common > request. But I'd imagine us using this for other things later on. We could > also use this to expose some of the taskMetrics, such as e.g. the input bytes. > {code} > val data = sc.textFile("s3n//..2014/*/*/*.json") > data.mapPartitions { > val tc = TaskContext.get > val filename = tc.getProperty(TaskContext.HADOOP_FILE_NAME) > val parts = fileName.split("/") > val (year, month, day) = (parts[3], parts[4], parts[5]) > ... > } > {code} > Internally we'd have a method called setProperty, but this wouldn't be > exposed initially. This is structured as a simple (String, String) hash map > for ease of porting to python. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org