Re: SPARK Exception thrown in awaitResult

2016-07-28 Thread Carlo . Allocca
Solved!!
The solution is using date_format with the “u” option.

Thank you very much.
Best,
Carlo

On 28 Jul 2016, at 18:59, carlo allocca 
> wrote:

Hi Mark,

Thanks for the suggestion.
I changed the maven entries as follows

spark-core_2.10
2.0.0

and
spark-sql_2.10
2.0.0

As result, it worked when I removed the following line of code to compute 
DAYOFWEEK (Monday—>1 etc.):

Dataset tmp6=tmp5.withColumn("ORD_DAYOFWEEK", callUDF("computeDayOfWeek", 
tmp5.col("ORD_time_window_per_hour#3").getItem("start").cast(DataTypes.StringType)));

 this.spark.udf().register("computeDayOfWeek", new UDF1() {
@Override
  public Integer call(String myDate) throws Exception {
Date date = new SimpleDateFormat(dateFormat).parse(myDate);
Calendar c = Calendar.getInstance();
c.setTime(date);
int dayOfWeek = c.get(Calendar.DAY_OF_WEEK);
  return dayOfWeek;//myDate.length();
}
  }, DataTypes.IntegerType);



And the full stack is reported below.

Is there another way to compute DAYOFWEEK from a dateFormat="-MM-dd 
HH:mm:ss" by using built-in function? I have  checked date_format but it does 
not do it.

Any Suggestion?

Many Thanks,
Carlo




Test set: org.mksmart.amaretto.ml.DatasetPerHourVerOneTest
---
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 32.658 sec <<< 
FAILURE!
testBuildDatasetNew(org.mksmart.amaretto.ml.DatasetPerHourVerOneTest)  Time 
elapsed: 32.581 sec  <<< ERROR!
org.apache.spark.SparkException: Task not serializable
at 
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298)
at 
org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2037)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1.apply(RDD.scala:798)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1.apply(RDD.scala:797)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:358)
at org.apache.spark.rdd.RDD.mapPartitionsWithIndex(RDD.scala:797)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:364)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
at 
org.apache.spark.sql.execution.TakeOrderedAndProjectExec.executeCollect(limit.scala:128)
at 
org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2183)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2532)
at 
org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2182)
at 
org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2189)
at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:1925)
at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:1924)
at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2562)
at org.apache.spark.sql.Dataset.head(Dataset.scala:1924)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2139)
at 
org.mksmart.amaretto.ml.DatasetPerHourVerOneTest.testBuildDatasetNew(DatasetPerHourVerOneTest.java:202)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 

Re: SPARK Exception thrown in awaitResult

2016-07-28 Thread Carlo . Allocca
Hi Mark,

Thanks for the suggestion.
I changed the maven entries as follows

spark-core_2.10
2.0.0

and
spark-sql_2.10
2.0.0

As result, it worked when I removed the following line of code to compute 
DAYOFWEEK (Monday—>1 etc.):

Dataset tmp6=tmp5.withColumn("ORD_DAYOFWEEK", callUDF("computeDayOfWeek", 
tmp5.col("ORD_time_window_per_hour#3").getItem("start").cast(DataTypes.StringType)));

 this.spark.udf().register("computeDayOfWeek", new UDF1() {
@Override
  public Integer call(String myDate) throws Exception {
Date date = new SimpleDateFormat(dateFormat).parse(myDate);
Calendar c = Calendar.getInstance();
c.setTime(date);
int dayOfWeek = c.get(Calendar.DAY_OF_WEEK);
  return dayOfWeek;//myDate.length();
}
  }, DataTypes.IntegerType);



And the full stack is reported below.

Is there another way to compute DAYOFWEEK from a dateFormat="-MM-dd 
HH:mm:ss" by using built-in function? I have  checked date_format but it does 
not do it.

Any Suggestion?

Many Thanks,
Carlo




Test set: org.mksmart.amaretto.ml.DatasetPerHourVerOneTest
---
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 32.658 sec <<< 
FAILURE!
testBuildDatasetNew(org.mksmart.amaretto.ml.DatasetPerHourVerOneTest)  Time 
elapsed: 32.581 sec  <<< ERROR!
org.apache.spark.SparkException: Task not serializable
at 
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298)
at 
org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2037)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1.apply(RDD.scala:798)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1.apply(RDD.scala:797)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:358)
at org.apache.spark.rdd.RDD.mapPartitionsWithIndex(RDD.scala:797)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:364)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
at 
org.apache.spark.sql.execution.TakeOrderedAndProjectExec.executeCollect(limit.scala:128)
at 
org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2183)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2532)
at 
org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2182)
at 
org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2189)
at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:1925)
at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:1924)
at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2562)
at org.apache.spark.sql.Dataset.head(Dataset.scala:1924)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2139)
at 
org.mksmart.amaretto.ml.DatasetPerHourVerOneTest.testBuildDatasetNew(DatasetPerHourVerOneTest.java:202)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at 

Re: SPARK Exception thrown in awaitResult

2016-07-28 Thread Mark Hamstra
Don't use Spark 2.0.0-preview.  That was a preview release with known
issues, and was intended to be used only for early, pre-release testing
purpose.  Spark 2.0.0 is now released, and you should be using that.

On Thu, Jul 28, 2016 at 3:48 AM, Carlo.Allocca 
wrote:

> and, of course I am using
>
>  
> org.apache.spark
> spark-core_2.11
> 2.0.0-preview
> 
>
>
> 
> org.apache.spark
> spark-sql_2.11
> 2.0.0-preview
> jar
> 
>
>
> Is the below problem/issue related to the experimental version of SPARK
> 2.0.0.
>
> Many Thanks for your help and support.
>
> Best Regards,
> carlo
>
> On 28 Jul 2016, at 11:14, Carlo.Allocca  wrote:
>
> I have also found the following two related links:
>
> 1)
> https://github.com/apache/spark/commit/947b9020b0d621bc97661a0a056297e6889936d3
> 2) https://github.com/apache/spark/pull/12433
>
> which both explain why it happens but nothing about what to do to solve
> it.
>
> Do you have any suggestion/recommendation?
>
> Many thanks.
> Carlo
>
> On 28 Jul 2016, at 11:06, carlo allocca  wrote:
>
> Hi Rui,
>
> Thanks for the promptly reply.
> No, I am not using Mesos.
>
> Ok. I am writing a code to build a suitable dataset for my needs as in the
> following:
>
> == Session configuration:
>
>  SparkSession spark = SparkSession
> .builder()
> .master("local[6]") //
> .appName("DatasetForCaseNew")
> .config("spark.executor.memory", "4g")
> .config("spark.shuffle.blockTransferService", "nio")
> .getOrCreate();
>
>
> public Dataset buildDataset(){
> ...
>
> // STEP A
> // Join prdDS with cmpDS
> Dataset prdDS_Join_cmpDS
> = res1
>   .join(res2,
> (res1.col("PRD_asin#100")).equalTo(res2.col("CMP_asin")), "inner");
>
> prdDS_Join_cmpDS.take(1);
>
> // STEP B
> // Join prdDS with cmpDS
> Dataset prdDS_Join_cmpDS_Join
> = prdDS_Join_cmpDS
>   .join(res3,
> prdDS_Join_cmpDS.col("PRD_asin#100").equalTo(res3.col("ORD_asin")),
> "inner");
> prdDS_Join_cmpDS_Join.take(1);
> prdDS_Join_cmpDS_Join.show();
>
> }
>
>
> The exception is thrown when the computation reach the STEP B, until STEP
> A is fine.
>
> Is there anything wrong or missing?
>
> Thanks for your help in advance.
>
> Best Regards,
> Carlo
>
>
>
>
>
> === STACK TRACE
>
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 422.102
> sec <<< FAILURE!
> testBuildDataset(org.mksmart.amaretto.ml.DatasetPerHourVerOneTest)  Time
> elapsed: 421.994 sec  <<< ERROR!
> org.apache.spark.SparkException: Exception thrown in awaitResult:
> at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:194)
> at
> org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:102)
> at
> org.apache.spark.sql.execution.InputAdapter.doExecuteBroadcast(WholeStageCodegenExec.scala:229)
> at
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125)
> at
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125)
> at
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
> at
> org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:124)
> at
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.prepareBroadcast(BroadcastHashJoinExec.scala:98)
> at
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenInner(BroadcastHashJoinExec.scala:197)
> at
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:82)
> at
> org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153)
> at
> org.apache.spark.sql.execution.joins.SortMergeJoinExec.consume(SortMergeJoinExec.scala:35)
> at
> org.apache.spark.sql.execution.joins.SortMergeJoinExec.doProduce(SortMergeJoinExec.scala:565)
> at
> org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83)
> at
> org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78)
> at
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
> at
> org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78)
> at
> org.apache.spark.sql.execution.joins.SortMergeJoinExec.produce(SortMergeJoinExec.scala:35)
> at
> 

Re: SPARK Exception thrown in awaitResult

2016-07-28 Thread Carlo . Allocca
and, of course I am using

 
org.apache.spark
spark-core_2.11
2.0.0-preview




org.apache.spark
spark-sql_2.11
2.0.0-preview
jar



Is the below problem/issue related to the experimental version of SPARK 2.0.0.

Many Thanks for your help and support.

Best Regards,
carlo

On 28 Jul 2016, at 11:14, Carlo.Allocca 
> wrote:

I have also found the following two related links:

1) 
https://github.com/apache/spark/commit/947b9020b0d621bc97661a0a056297e6889936d3
2) https://github.com/apache/spark/pull/12433

which both explain why it happens but nothing about what to do to solve it.

Do you have any suggestion/recommendation?

Many thanks.
Carlo

On 28 Jul 2016, at 11:06, carlo allocca 
> wrote:

Hi Rui,

Thanks for the promptly reply.
No, I am not using Mesos.

Ok. I am writing a code to build a suitable dataset for my needs as in the 
following:

== Session configuration:

 SparkSession spark = SparkSession
.builder()
.master("local[6]") //
.appName("DatasetForCaseNew")
.config("spark.executor.memory", "4g")
.config("spark.shuffle.blockTransferService", "nio")
.getOrCreate();


public Dataset buildDataset(){
...

// STEP A
// Join prdDS with cmpDS
Dataset prdDS_Join_cmpDS
= res1
  .join(res2, 
(res1.col("PRD_asin#100")).equalTo(res2.col("CMP_asin")), "inner");

prdDS_Join_cmpDS.take(1);

// STEP B
// Join prdDS with cmpDS
Dataset prdDS_Join_cmpDS_Join
= prdDS_Join_cmpDS
  .join(res3, 
prdDS_Join_cmpDS.col("PRD_asin#100").equalTo(res3.col("ORD_asin")), "inner");
prdDS_Join_cmpDS_Join.take(1);
prdDS_Join_cmpDS_Join.show();

}


The exception is thrown when the computation reach the STEP B, until STEP A is 
fine.

Is there anything wrong or missing?

Thanks for your help in advance.

Best Regards,
Carlo





=== STACK TRACE

Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 422.102 sec <<< 
FAILURE!
testBuildDataset(org.mksmart.amaretto.ml.DatasetPerHourVerOneTest)  Time 
elapsed: 421.994 sec  <<< ERROR!
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:194)
at 
org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:102)
at 
org.apache.spark.sql.execution.InputAdapter.doExecuteBroadcast(WholeStageCodegenExec.scala:229)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at 
org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:124)
at 
org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.prepareBroadcast(BroadcastHashJoinExec.scala:98)
at 
org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenInner(BroadcastHashJoinExec.scala:197)
at 
org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:82)
at 
org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153)
at 
org.apache.spark.sql.execution.joins.SortMergeJoinExec.consume(SortMergeJoinExec.scala:35)
at 
org.apache.spark.sql.execution.joins.SortMergeJoinExec.doProduce(SortMergeJoinExec.scala:565)
at 
org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83)
at 
org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at 
org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78)
at 
org.apache.spark.sql.execution.joins.SortMergeJoinExec.produce(SortMergeJoinExec.scala:35)
at 
org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doProduce(BroadcastHashJoinExec.scala:77)
at 
org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83)
at 
org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at 

Re: SPARK Exception thrown in awaitResult

2016-07-28 Thread Carlo . Allocca
I have also found the following two related links:

1) 
https://github.com/apache/spark/commit/947b9020b0d621bc97661a0a056297e6889936d3
2) https://github.com/apache/spark/pull/12433

which both explain why it happens but nothing about what to do to solve it.

Do you have any suggestion/recommendation?

Many thanks.
Carlo

On 28 Jul 2016, at 11:06, carlo allocca 
> wrote:

Hi Rui,

Thanks for the promptly reply.
No, I am not using Mesos.

Ok. I am writing a code to build a suitable dataset for my needs as in the 
following:

== Session configuration:

 SparkSession spark = SparkSession
.builder()
.master("local[6]") //
.appName("DatasetForCaseNew")
.config("spark.executor.memory", "4g")
.config("spark.shuffle.blockTransferService", "nio")
.getOrCreate();


public Dataset buildDataset(){
...

// STEP A
// Join prdDS with cmpDS
Dataset prdDS_Join_cmpDS
= res1
  .join(res2, 
(res1.col("PRD_asin#100")).equalTo(res2.col("CMP_asin")), "inner");

prdDS_Join_cmpDS.take(1);

// STEP B
// Join prdDS with cmpDS
Dataset prdDS_Join_cmpDS_Join
= prdDS_Join_cmpDS
  .join(res3, 
prdDS_Join_cmpDS.col("PRD_asin#100").equalTo(res3.col("ORD_asin")), "inner");
prdDS_Join_cmpDS_Join.take(1);
prdDS_Join_cmpDS_Join.show();

}


The exception is thrown when the computation reach the STEP B, until STEP A is 
fine.

Is there anything wrong or missing?

Thanks for your help in advance.

Best Regards,
Carlo





=== STACK TRACE

Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 422.102 sec <<< 
FAILURE!
testBuildDataset(org.mksmart.amaretto.ml.DatasetPerHourVerOneTest)  Time 
elapsed: 421.994 sec  <<< ERROR!
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:194)
at 
org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:102)
at 
org.apache.spark.sql.execution.InputAdapter.doExecuteBroadcast(WholeStageCodegenExec.scala:229)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at 
org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:124)
at 
org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.prepareBroadcast(BroadcastHashJoinExec.scala:98)
at 
org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenInner(BroadcastHashJoinExec.scala:197)
at 
org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:82)
at 
org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153)
at 
org.apache.spark.sql.execution.joins.SortMergeJoinExec.consume(SortMergeJoinExec.scala:35)
at 
org.apache.spark.sql.execution.joins.SortMergeJoinExec.doProduce(SortMergeJoinExec.scala:565)
at 
org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83)
at 
org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at 
org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78)
at 
org.apache.spark.sql.execution.joins.SortMergeJoinExec.produce(SortMergeJoinExec.scala:35)
at 
org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doProduce(BroadcastHashJoinExec.scala:77)
at 
org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83)
at 
org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at 
org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78)
at 
org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.produce(BroadcastHashJoinExec.scala:38)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec.doCodeGen(WholeStageCodegenExec.scala:304)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:343)

Re: SPARK Exception thrown in awaitResult

2016-07-28 Thread Carlo . Allocca
Hi Rui,

Thanks for the promptly reply.
No, I am not using Mesos.

Ok. I am writing a code to build a suitable dataset for my needs as in the 
following:

== Session configuration:

 SparkSession spark = SparkSession
.builder()
.master("local[6]") //
.appName("DatasetForCaseNew")
.config("spark.executor.memory", "4g")
.config("spark.shuffle.blockTransferService", "nio")
.getOrCreate();


public Dataset buildDataset(){
...

// STEP A
// Join prdDS with cmpDS
Dataset prdDS_Join_cmpDS
= res1
  .join(res2, 
(res1.col("PRD_asin#100")).equalTo(res2.col("CMP_asin")), "inner");

prdDS_Join_cmpDS.take(1);

// STEP B
// Join prdDS with cmpDS
Dataset prdDS_Join_cmpDS_Join
= prdDS_Join_cmpDS
  .join(res3, 
prdDS_Join_cmpDS.col("PRD_asin#100").equalTo(res3.col("ORD_asin")), "inner");
prdDS_Join_cmpDS_Join.take(1);
prdDS_Join_cmpDS_Join.show();

}


The exception is thrown when the computation reach the STEP B, until STEP A is 
fine.

Is there anything wrong or missing?

Thanks for your help in advance.

Best Regards,
Carlo





=== STACK TRACE

Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 422.102 sec <<< 
FAILURE!
testBuildDataset(org.mksmart.amaretto.ml.DatasetPerHourVerOneTest)  Time 
elapsed: 421.994 sec  <<< ERROR!
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:194)
at 
org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:102)
at 
org.apache.spark.sql.execution.InputAdapter.doExecuteBroadcast(WholeStageCodegenExec.scala:229)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at 
org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:124)
at 
org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.prepareBroadcast(BroadcastHashJoinExec.scala:98)
at 
org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenInner(BroadcastHashJoinExec.scala:197)
at 
org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:82)
at 
org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153)
at 
org.apache.spark.sql.execution.joins.SortMergeJoinExec.consume(SortMergeJoinExec.scala:35)
at 
org.apache.spark.sql.execution.joins.SortMergeJoinExec.doProduce(SortMergeJoinExec.scala:565)
at 
org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83)
at 
org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at 
org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78)
at 
org.apache.spark.sql.execution.joins.SortMergeJoinExec.produce(SortMergeJoinExec.scala:35)
at 
org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doProduce(BroadcastHashJoinExec.scala:77)
at 
org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83)
at 
org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at 
org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78)
at 
org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.produce(BroadcastHashJoinExec.scala:38)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec.doCodeGen(WholeStageCodegenExec.scala:304)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:343)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 

Re: SPARK Exception thrown in awaitResult

2016-07-28 Thread Sun Rui
Are you using Mesos? if not , https://issues.apache.org/jira/browse/SPARK-16522 
  is not relevant

You may describe more information about your Spark environment, and the full 
stack trace.
> On Jul 28, 2016, at 17:44, Carlo.Allocca  wrote:
> 
> Hi All, 
> 
> I am running SPARK locally, and when running d3=join(d1,d2) and d5=(d3, d4) 
> am getting  the following exception "org.apache.spark.SparkException: 
> Exception thrown in awaitResult”. 
> Googling for it, I found that the closed is the answer reported 
> https://issues.apache.org/jira/browse/SPARK-16522 
>  which mention that it is 
> bug of the SPARK 2.0.0. 
> 
> Is it correct or am I missing anything? 
> 
> Many Thanks for your answer and help. 
> 
> Best Regards,
> Carlo
> 
> -- The Open University is incorporated by Royal Charter (RC 000391), an 
> exempt charity in England & Wales and a charity registered in Scotland (SC 
> 038302). The Open University is authorised and regulated by the Financial 
> Conduct Authority.