git commit: [SPARK-3771][SQL] AppendingParquetOutputFormat should use reflection to prevent from breaking binary-compatibility.

marmbrus Mon, 13 Oct 2014 13:44:16 -0700

Repository: spark
Updated Branches:
  refs/heads/master d3cdf9128 -> 73da9c26b



[SPARK-3771][SQL] AppendingParquetOutputFormat should use reflection to prevent 
from breaking binary-compatibility.

Original problem is 
[SPARK-3764](https://issues.apache.org/jira/browse/SPARK-3764).

`AppendingParquetOutputFormat` uses a binary-incompatible method 
`context.getTaskAttemptID`.
This causes binary-incompatible of Spark itself, i.e. if Spark itself is built 
against hadoop-1, the artifact is for only hadoop-1, and vice versa.

Author: Takuya UESHIN <[email protected]>

Closes #2638 from ueshin/issues/SPARK-3771 and squashes the following commits:

efd3784 [Takuya UESHIN] Add a comment to explain the reason to use reflection.
ec213c1 [Takuya UESHIN] Use reflection to prevent breaking binary-compatibility.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/73da9c26
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/73da9c26
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/73da9c26

Branch: refs/heads/master
Commit: 73da9c26b0e2e8bf0ab055906211727a7097c963
Parents: d3cdf91
Author: Takuya UESHIN <[email protected]>
Authored: Mon Oct 13 13:43:41 2014 -0700
Committer: Michael Armbrust <[email protected]>
Committed: Mon Oct 13 13:43:41 2014 -0700

----------------------------------------------------------------------
 .../apache/spark/sql/parquet/ParquetTableOperations.scala | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/73da9c26/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableOperations.scala
----------------------------------------------------------------------
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableOperations.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableOperations.scala
index ffb7323..1f4237d 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableOperations.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableOperations.scala
@@ -331,13 +331,21 @@ private[parquet] class 
AppendingParquetOutputFormat(offset: Int)
 
   // override to choose output filename so not overwrite existing ones
   override def getDefaultWorkFile(context: TaskAttemptContext, extension: 
String): Path = {
-    val taskId: TaskID = context.getTaskAttemptID.getTaskID
+    val taskId: TaskID = getTaskAttemptID(context).getTaskID
     val partition: Int = taskId.getId
     val filename = s"part-r-${partition + offset}.parquet"
     val committer: FileOutputCommitter =
       getOutputCommitter(context).asInstanceOf[FileOutputCommitter]
     new Path(committer.getWorkPath, filename)
   }
+
+  // The TaskAttemptContext is a class in hadoop-1 but is an interface in 
hadoop-2.
+  // The signatures of the method TaskAttemptContext.getTaskAttemptID for the 
both versions
+  // are the same, so the method calls are source-compatible but NOT 
binary-compatible because
+  // the opcode of method call for class is INVOKEVIRTUAL and for interface is 
INVOKEINTERFACE.
+  private def getTaskAttemptID(context: TaskAttemptContext): TaskAttemptID = {
+    
context.getClass.getMethod("getTaskAttemptID").invoke(context).asInstanceOf[TaskAttemptID]
+  }
 }
 
 /**


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

git commit: [SPARK-3771][SQL] AppendingParquetOutputFormat should use reflection to prevent from breaking binary-compatibility.

Reply via email to