[Stream] Checkpointing | chmod: cannot access `/cygdrive/d/tmp/spark/f8e594bf-d940-41cb-ab0e-0fd3710696cb/rdd-57/.part-00001-attempt-215': No such file or directory

2014-09-01 Thread Aniket Bhatnagar
On my local (windows) dev environment, I have been trying to get spark
streaming running to test my real time(ish) jobs. I have set the checkpoint
directory as /tmp/spark and have installed latest cygwin. I keep getting
the following error:

org.apache.hadoop.util.Shell$ExitCodeException: chmod: cannot access
`/cygdrive/d/tmp/spark/f8e594bf-d940-41cb-ab0e-0fd3710696cb/rdd-57/.part-1-attempt-215':
No such file or directory


Although nothing breaks but such errors are a bit annoying. Any clues on
how to fix the issue?


Re: [Stream] Checkpointing | chmod: cannot access `/cygdrive/d/tmp/spark/f8e594bf-d940-41cb-ab0e-0fd3710696cb/rdd-57/.part-00001-attempt-215': No such file or directory

2014-09-01 Thread Aniket Bhatnagar
Hi everyone

It turns out that  I had chef installed and it's chmod has higher
preference than cygwin's chmod in the PATH. I fixed the environment
variable and now its working fine.


On 1 September 2014 11:48, Aniket Bhatnagar aniket.bhatna...@gmail.com
wrote:

 On my local (windows) dev environment, I have been trying to get spark
 streaming running to test my real time(ish) jobs. I have set the checkpoint
 directory as /tmp/spark and have installed latest cygwin. I keep getting
 the following error:

 org.apache.hadoop.util.Shell$ExitCodeException: chmod: cannot access
 `/cygdrive/d/tmp/spark/f8e594bf-d940-41cb-ab0e-0fd3710696cb/rdd-57/.part-1-attempt-215':
 No such file or directory


 Although nothing breaks but such errors are a bit annoying. Any clues on
 how to fix the issue?



operations on replicated RDD

2014-09-01 Thread rapelly kartheek
Hi,

An RDD replicated by an application is owned by only that application. No
other applications can share it. Then, what is motive behind providing the
rdd replication feature. What all oparations can be performed on the
replicated RDD.

Thank you!!!
-karthik


Spark driver application can not connect to Spark-Master

2014-09-01 Thread moon soo Lee
Hi, I'm developing an application with Spark.

My java application trying to creates spark context like


 Creating spark context 

public SparkContext createSparkContext(){
String execUri = System.getenv(SPARK_EXECUTOR_URI);
 String[] jars = SparkILoop.getAddedJars();
SparkConf conf = new SparkConf().setMaster(getMaster())
.setAppName(App name).setJars(jars)
 .set(spark.repl.class.uri, interpreter.intp().classServer().uri());
if (execUri != null) {
 conf.set(spark.executor.uri, execUri);
}
if (System.getenv(SPARK_HOME) != null) {
 conf.setSparkHome(System.getenv(SPARK_HOME));
}
SparkContext sparkContext = new SparkContext(conf);
 return sparkContext;
}
 public String getMaster() {
 String envMaster = System.getenv().get(MASTER);
if(envMaster!=null) return envMaster;
String propMaster = System.getProperty(spark.master);
 if(propMaster!=null) return propMaster;
return local[*];
}


But when i call createSparkContext(), in driver side, i got logs like


-- My application's log -
 INFO [2014-09-01 17:28:37,092] ({pool-1-thread-2}
Logging.scala[logInfo]:58) - Changing view acls to: root
 INFO [2014-09-01 17:28:37,092] ({pool-1-thread-2}
Logging.scala[logInfo]:58) - SecurityManager: authentication disabled; ui
acls disabled; users with view permissions: Set(root)
 INFO [2014-09-01 17:28:37,093] ({pool-1-thread-2}
Logging.scala[logInfo]:58) - Starting HTTP Server
 INFO [2014-09-01 17:28:37,096] ({pool-1-thread-2}
Server.java[doStart]:272) - jetty-8.1.14.v20131031
 INFO [2014-09-01 17:28:37,099] ({pool-1-thread-2}
AbstractConnector.java[doStart]:338) - Started SocketConnector@0.0.0.0:46610
 INFO [2014-09-01 17:28:40,050] ({pool-1-thread-2}
Logging.scala[logInfo]:58) - Changing view acls to: root
 INFO [2014-09-01 17:28:40,050] ({pool-1-thread-2}
Logging.scala[logInfo]:58) - SecurityManager: authentication disabled; ui
acls disabled; users with view permissions: Set(root)
 INFO [2014-09-01 17:28:40,589] ({spark-akka.actor.default-dispatcher-2}
Slf4jLogger.scala[applyOrElse]:80) - Slf4jLogger started
 INFO [2014-09-01 17:28:40,626] ({spark-akka.actor.default-dispatcher-2}
Slf4jLogger.scala[apply$mcV$sp]:74) - Starting remoting
 INFO [2014-09-01 17:28:40,833] ({spark-akka.actor.default-dispatcher-3}
Slf4jLogger.scala[apply$mcV$sp]:74) - Remoting started; listening on
addresses :[akka.tcp://spark@222.122.122.122:46833]
 INFO [2014-09-01 17:28:40,835] ({spark-akka.actor.default-dispatcher-4}
Slf4jLogger.scala[apply$mcV$sp]:74) - Remoting now listens on addresses:
[akka.tcp://spark@222.122.122.122:46833]
 INFO [2014-09-01 17:28:40,858] ({pool-1-thread-2}
Logging.scala[logInfo]:58) - Registering MapOutputTracker
 INFO [2014-09-01 17:28:40,861] ({pool-1-thread-2}
Logging.scala[logInfo]:58) - Registering BlockManagerMaster
 INFO [2014-09-01 17:28:40,877] ({pool-1-thread-2}
Logging.scala[logInfo]:58) - Created local directory at
/tmp/spark-local-20140901172840-baf4
 INFO [2014-09-01 17:28:40,881] ({pool-1-thread-2}
Logging.scala[logInfo]:58) - MemoryStore started with capacity 546.3 MB.
 INFO [2014-09-01 17:28:40,912] ({pool-1-thread-2}
Logging.scala[logInfo]:58) - Bound socket to port 42671 with id =
ConnectionManagerId(222.122.122.122,42671)
 INFO [2014-09-01 17:28:40,917] ({pool-1-thread-2}
Logging.scala[logInfo]:58) - Trying to register BlockManager
 INFO [2014-09-01 17:28:40,920] ({spark-akka.actor.default-dispatcher-4}
Logging.scala[logInfo]:58) - Registering block manager 222.122.122.122:42671
with 546.3 MB RAM
 INFO [2014-09-01 17:28:40,921] ({pool-1-thread-2}
Logging.scala[logInfo]:58) - Registered BlockManager
 INFO [2014-09-01 17:28:40,932] ({pool-1-thread-2}
Logging.scala[logInfo]:58) - Starting HTTP Server
 INFO [2014-09-01 17:28:40,933] ({pool-1-thread-2}
Server.java[doStart]:272) - jetty-8.1.14.v20131031
 INFO [2014-09-01 17:28:40,935] ({pool-1-thread-2}
AbstractConnector.java[doStart]:338) - Started SocketConnector@0.0.0.0:52020
 INFO [2014-09-01 17:28:40,936] ({pool-1-thread-2}
Logging.scala[logInfo]:58) - Broadcast server started at
http://222.122.122.122:52020
 INFO [2014-09-01 17:28:40,943] ({pool-1-thread-2}
Logging.scala[logInfo]:58) - HTTP File server directory is
/tmp/spark-fc4cc226-c740-4cec-ad0f-6f88762d365c
 INFO [2014-09-01 17:28:40,943] ({pool-1-thread-2}
Logging.scala[logInfo]:58) - Starting HTTP Server
 INFO [2014-09-01 17:28:40,944] ({pool-1-thread-2}
Server.java[doStart]:272) - jetty-8.1.14.v20131031
 INFO [2014-09-01 17:28:40,946] ({pool-1-thread-2}
AbstractConnector.java[doStart]:338) - Started SocketConnector@0.0.0.0:59458
 INFO [2014-09-01 17:28:41,167] ({pool-1-thread-2}
Server.java[doStart]:272) - jetty-8.1.14.v20131031
 INFO [2014-09-01 17:28:41,177] ({pool-1-thread-2}
AbstractConnector.java[doStart]:338) - Started
SelectChannelConnector@0.0.0.0:4040
 INFO [2014-09-01 17:28:41,180] ({pool-1-thread-2}
Logging.scala[logInfo]:58) - Started SparkUI at http://222.122.122.122:4040
 INFO [2014-09-01 17:28:41,410] 

Can value in spark-defaults.conf support system variables?

2014-09-01 Thread Zhanfeng Huo
Hi,all:

Can value in spark-defaults.conf support system variables?

Such as mess = ${user.home}/${user.name}. 

Best Regards



Zhanfeng Huo


Has anybody faced SPARK-2604 issue regarding Application hang state

2014-09-01 Thread twinkle sachdeva
Hi,

Has anyone else also experienced
https://issues.apache.org/jira/browse/SPARK-2604?

It is an edge case scenario of mis configuration, where the executor memory
asked is same as the maximum allowed memory by yarn. In such situation,
application stays in hang state, and the reason is not logged in verbose
manner to be debugged easily.

As per the fix, it gets detected and corresponding reasons are logged
before failing the application.

I will prefer the fix to be in open source code version, please share your
thoughts.

Thanks,


Value of SHUFFLE_PARTITIONS

2014-09-01 Thread Chirag Aggarwal
Hi,

Currently the number of shuffle partitions is config driven parameter 
(SHUFFLE_PARTITIONS) . This means that anyone who is running a spark-sql query 
should first of
all analyze that what value of SHUFFLE_PARTITIONS would give the best 
performance for the query.

Shouldn't there be a logic in SparkSql which should be able to figure out the 
best value and also provide a mechanism to give preference to user specified 
value.
This I believe can be worked out on the basis of number of partitions in the 
original data.

I ran some queries and with default value (200) of shuffle-partitioning, and 
when I changed this value to 5, the time taken by the query reduced by nearly 
35%.

Thanks,
Chirag


[Streaming] Triggering an action in absence of data

2014-09-01 Thread Aniket Bhatnagar
Hi all

I am struggling to implement a use case wherein I need to trigger an action
in case no data has been received for X amount of time. I haven't been able
to figure out an easy way to do this. No state/foreach methods get called
when no data has arrived. I thought of generating a 'tick' DStream that
generates an arbitrary object and union/group the tick stream with data
stream to detect that data hasn't arrived for X amount of time. However,
since my data DStream is Paired (has key-value tuple) and I use
updateStateByKey method for processing the data stream, I can't group/union
it with tick stream(s) without knowing all keys in advance.

My second idea was to push data from DStream to an actor and let actor (per
key) manage state and data absent use cases. However, there is no way to
run an actor continuously for all data belonging to a key or a partition.

I am stuck now and can't think of anything else to solve for the use case.
Has anyone else ran into similar issue? Any thoughts on how the use case
could be implemented in Spark streaming?

Thanks,
Aniket


Re: Problem Accessing Hive Table from hiveContext

2014-09-01 Thread Yin Huai
Hello Igor,

Although Decimal is supported, Hive 0.12 does not support user definable
precision and scale (it was introduced in Hive 0.13).

Thanks,

Yin


On Sat, Aug 30, 2014 at 1:50 AM, Zitser, Igor igor.zit...@citi.com wrote:

 Hi All,
 New to spark and using Spark 1.0.2 and hive 0.12.

 If hive table created as test_datatypes(testbigint bigint, ss bigint )
  select * from test_datatypes from spark works fine.

 For create table test_datatypes(testbigint bigint, testdec decimal(5,2) )

 scala val dataTypes=hiveContext.hql(select * from test_datatypes)
 14/08/28 21:18:44 INFO parse.ParseDriver: Parsing command: select * from
 test_datatypes
 14/08/28 21:18:44 INFO parse.ParseDriver: Parse Completed
 14/08/28 21:18:44 INFO analysis.Analyzer: Max iterations (2) reached for
 batch MultiInstanceRelations
 14/08/28 21:18:44 INFO analysis.Analyzer: Max iterations (2) reached for
 batch CaseInsensitiveAttributeReferences
 java.lang.IllegalArgumentException: Error: ',', ':', or ';' expected at
 position 14 from 'bigint:decimal(5,2)' [0:bigint, 6::, 7:decimal, 14:(,
 15:5, 16:,, 17:2, 18:)]
 at
 org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseTypeInfos(TypeInfoUtils.java:312)
 at
 org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.getTypeInfosFromTypeString(TypeInfoUtils.java:716)
 at
 org.apache.hadoop.hive.serde2.lazy.LazyUtils.extractColumnInfo(LazyUtils.java:364)
 at
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.initSerdeParams(LazySimpleSerDe.java:288)
 at
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.initialize(LazySimpleSerDe.java:187)
 at
 org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:218)
 at
 org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:272)
 at
 org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:175)
 at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:991)
 at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:924)
 at
 org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:58)
 at org.apache.spark.sql.hive.HiveContext$$anon$2.org
 $apache$spark$sql$catalyst$analysis$OverrideCatalog$$super$lookupRelation(HiveContext.scala:143)
 at
 org.apache.spark.sql.catalyst.analysis.OverrideCatalog$$anonfun$lookupRelation$3.apply(Catalog.scala:122)
 at
 org.apache.spark.sql.catalyst.analysis.OverrideCatalog$$anonfun$lookupRelation$3.apply(Catalog.scala:122)
 at scala.Option.getOrElse(Option.scala:120)
 at
 org.apache.spark.sql.catalyst.analysis.OverrideCatalog$class.lookupRelation(Catalog.scala:122)
 at
 org.apache.spark.sql.hive.HiveContext$$anon$2.lookupRelation(HiveContext.scala:149)
 at
 org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$2.applyOrElse(Analyzer.scala:83)
 at
 org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$2.applyOrElse(Analyzer.scala:81)
 at
 org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165)
 at
 org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:183)
 at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at
 scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
 at
 scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
 at
 scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
 at scala.collection.TraversableOnce$class.to
 (TraversableOnce.scala:273)
 at scala.collection.AbstractIterator.to(Iterator.scala:1157)
 at
 scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
 at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
 at
 scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
 at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
 at
 org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:212)
 at
 org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:168)
 at
 org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:156)


 Same exception happens using table as create table
 test_datatypes(testbigint bigint, testdate date ) .

 Thanks, Igor.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: how to filter value in spark

2014-09-01 Thread Matthew Farrellee
you could join, it'll give you the intersection and a list of the labels
where the value was found.

 a.join(b).collect
Array[(String, (String, String))] = Array((4,(a,b)), (3,(a,b)))

best,


matt

On 08/31/2014 09:23 PM, Liu, Raymond wrote:
 You could use cogroup to combine RDDs in one RDD for cross reference 
 processing.
 
 e.g.
 
 a.cogroup(b). filter{case (_, (l,r)) = l.nonEmpty  r.nonEmpty }. map{case 
 (k,(l,r)) = (k, l)}
 
 Best Regards,
 Raymond Liu
 
 -Original Message-
 From: marylucy [mailto:qaz163wsx_...@hotmail.com]
 Sent: Friday, August 29, 2014 9:26 PM
 To: Matthew Farrellee
 Cc: user@spark.apache.org
 Subject: Re: how to filter value in spark
 
 i see it works well,thank you!!!
 
 But in follow situation how to do
 
 var a = sc.textFile(/sparktest/1/).map((_,a))
 var b = sc.textFile(/sparktest/2/).map((_,b))
 How to get (3,a) and (4,a)
 
 
 在 Aug 28, 2014,19:54,Matthew Farrellee m...@redhat.com 写道:
 
 On 08/28/2014 07:20 AM, marylucy wrote:
 fileA=1 2 3 4  one number a line,save in /sparktest/1/
 fileB=3 4 5 6  one number a line,save in /sparktest/2/ I want to get
 3 and 4

 var a = sc.textFile(/sparktest/1/).map((_,1))
 var b = sc.textFile(/sparktest/2/).map((_,1))

 a.filter(param={b.lookup(param._1).length0}).map(_._1).foreach(prin
 tln)

 Error throw
 Scala.MatchError:Null
 PairRDDFunctions.lookup...

 the issue is nesting of the b rdd inside a transformation of the a rdd

 consider using intersection, it's more idiomatic

 a.intersection(b).foreach(println)

 but not that intersection will remove duplicates

 best,


 matt

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For
 additional commands, e-mail: user-h...@spark.apache.org

 B�CB??[��X�剀�X�KK[XZ[
 ?\�\�][��X�剀�X�P?\���\X?KBY][��[圹[X[??K[XZ[
 ?\�\�Z[?\���\X?KB�B
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org
 


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: transforming a Map object to RDD

2014-09-01 Thread Matthew Farrellee

and in python,

 map = {'a': 1, 'b': 2, 'c': 3}
 rdd = sc.parallelize(map.items())
 rdd.collect()
[('a', 1), ('c', 3), ('b', 2)]

best,


matt

On 08/28/2014 07:01 PM, Sean Owen wrote:

val map = Map(foo - 1, bar - 2, baz - 3)
val rdd = sc.parallelize(map.toSeq)

rdd is a an RDD[(String,Int)] and you can do what you like from there.

On Thu, Aug 28, 2014 at 11:56 PM, SK skrishna...@gmail.com wrote:

Hi,

How do I convert a Map object to an RDD so that I can use the
saveAsTextFile() operation to output the Map object?

thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/transforming-a-Map-object-to-RDD-tp13071.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org




-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Spark and Shark

2014-09-01 Thread arthur.hk.c...@gmail.com
Hi,

I have installed Spark 1.0.2 and Shark 0.9.2 on Hadoop 2.4.1 (by compiling from 
source).

spark: 1.0.2
shark: 0.9.2
hadoop: 2.4.1
java: java version “1.7.0_67”
protobuf: 2.5.0


I have tried the smoke test in shark but got  
“java.util.NoSuchElementException” error,  can you please advise how to fix 
this?

shark create table x1 (a INT);
FAILED: Hive Internal Error: java.util.NoSuchElementException(null)
14/09/01 23:04:24 [main]: ERROR shark.SharkDriver: FAILED: Hive Internal Error: 
java.util.NoSuchElementException(null)
java.util.NoSuchElementException
at java.util.HashMap$HashIterator.nextEntry(HashMap.java:925)
at java.util.HashMap$ValueIterator.next(HashMap.java:950)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:8117)
at 
shark.parse.SharkSemanticAnalyzer.analyzeInternal(SharkSemanticAnalyzer.scala:150)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:284)
at shark.SharkDriver.compile(SharkDriver.scala:215)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:342)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:977)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888)
at shark.SharkCliDriver.processCmd(SharkCliDriver.scala:340)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
at shark.SharkCliDriver$.main(SharkCliDriver.scala:237)
at shark.SharkCliDriver.main(SharkCliDriver.scala)


spark-env.sh
#!/usr/bin/env bash
export CLASSPATH=$HBASE_HOME/lib/hadoop-snappy-0.0.1-SNAPSHOT.jar
export CLASSPATH=$CLASSPATH:$HIVE_HOME/lib/mysql-connector-java-5.1.31-bin.jar
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native/Linux-amd64-64
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/hadoop}
export 
SPARK_CLASSPATH=$SPARK_HOME/lib_managed/jars/mysql-connector-java-5.1.31-bin.jar
export SPARK_WORKER_MEMORY=2g
export HADOOP_HEAPSIZE=2000

spark-defaults.conf
spark.executor.memory   2048m
spark.shuffle.spill.compressfalse

shark-env.sh
#!/usr/bin/env bash
export SPARK_MEM=2g
export SHARK_MASTER_MEM=2g
SPARK_JAVA_OPTS= -Dspark.local.dir=/tmp 
SPARK_JAVA_OPTS+=-Dspark.kryoserializer.buffer.mb=10 
SPARK_JAVA_OPTS+=-verbose:gc -XX:-PrintGCDetails -XX:+PrintGCTimeStamps 
export SPARK_JAVA_OPTS
export SHARK_EXEC_MODE=yarn
export 
SPARK_ASSEMBLY_JAR=$SCALA_HOME/assembly/target/scala-2.10/spark-assembly-1.0.2-hadoop2.4.1.jar
export SHARK_ASSEMBLY_JAR=target/scala-2.10/shark_2.10-0.9.2.jar
export HIVE_CONF_DIR=$HIVE_HOME/conf
export SPARK_LIBPATH=$HADOOP_HOME/lib/native/
export SPARK_LIBRARY_PATH=$HADOOP_HOME/lib/native/
export 
SPARK_CLASSPATH=$SHARK_HOME/lib/hadoop-snappy-0.0.1-SNAPSHOT.jar:$SHARK_HOME/lib/protobuf-java-2.5.0.jar


Regards
Arthur



Re: Spark and Shark

2014-09-01 Thread Michael Armbrust
I don't believe that Shark works with Spark  1.0.  Have you considered
trying Spark SQL?


On Mon, Sep 1, 2014 at 8:21 AM, arthur.hk.c...@gmail.com 
arthur.hk.c...@gmail.com wrote:

 Hi,

 I have installed Spark 1.0.2 and Shark 0.9.2 on Hadoop 2.4.1 (by compiling
 from source).

 spark: 1.0.2
 shark: 0.9.2
 hadoop: 2.4.1
 java: java version “1.7.0_67”
 protobuf: 2.5.0


 I have tried the smoke test in shark but got
  “java.util.NoSuchElementException” error,  can you please advise how to
 fix this?

 shark create table x1 (a INT);
 FAILED: Hive Internal Error: java.util.NoSuchElementException(null)
 14/09/01 23:04:24 [main]: ERROR shark.SharkDriver: FAILED: Hive Internal
 Error: java.util.NoSuchElementException(null)
 java.util.NoSuchElementException
 at java.util.HashMap$HashIterator.nextEntry(HashMap.java:925)
 at java.util.HashMap$ValueIterator.next(HashMap.java:950)
  at
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:8117)
 at
 shark.parse.SharkSemanticAnalyzer.analyzeInternal(SharkSemanticAnalyzer.scala:150)
  at
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:284)
 at shark.SharkDriver.compile(SharkDriver.scala:215)
  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:342)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:977)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888)
  at shark.SharkCliDriver.processCmd(SharkCliDriver.scala:340)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
  at shark.SharkCliDriver$.main(SharkCliDriver.scala:237)
 at shark.SharkCliDriver.main(SharkCliDriver.scala)


 spark-env.sh
 #!/usr/bin/env bash
 export CLASSPATH=$HBASE_HOME/lib/hadoop-snappy-0.0.1-SNAPSHOT.jar
 export
 CLASSPATH=$CLASSPATH:$HIVE_HOME/lib/mysql-connector-java-5.1.31-bin.jar
 export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native/Linux-amd64-64
 export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/hadoop}
 export
 SPARK_CLASSPATH=$SPARK_HOME/lib_managed/jars/mysql-connector-java-5.1.31-bin.jar
 export SPARK_WORKER_MEMORY=2g
 export HADOOP_HEAPSIZE=2000

 spark-defaults.conf
 spark.executor.memory   2048m
 spark.shuffle.spill.compressfalse

 shark-env.sh
 #!/usr/bin/env bash
 export SPARK_MEM=2g
 export SHARK_MASTER_MEM=2g
 SPARK_JAVA_OPTS= -Dspark.local.dir=/tmp 
 SPARK_JAVA_OPTS+=-Dspark.kryoserializer.buffer.mb=10 
 SPARK_JAVA_OPTS+=-verbose:gc -XX:-PrintGCDetails -XX:+PrintGCTimeStamps 
 export SPARK_JAVA_OPTS
 export SHARK_EXEC_MODE=yarn
 export
 SPARK_ASSEMBLY_JAR=$SCALA_HOME/assembly/target/scala-2.10/spark-assembly-1.0.2-hadoop2.4.1.jar
 export SHARK_ASSEMBLY_JAR=target/scala-2.10/shark_2.10-0.9.2.jar
 export HIVE_CONF_DIR=$HIVE_HOME/conf
 export SPARK_LIBPATH=$HADOOP_HOME/lib/native/
 export SPARK_LIBRARY_PATH=$HADOOP_HOME/lib/native/
 export
 SPARK_CLASSPATH=$SHARK_HOME/lib/hadoop-snappy-0.0.1-SNAPSHOT.jar:$SHARK_HOME/lib/protobuf-java-2.5.0.jar


 Regards
 Arthur




RE: Spark and Shark

2014-09-01 Thread Paolo Platter
We tried to connect the old Simba Shark ODBC driver to the Thrift JDBC Server 
with Spark 1.1 RC2 and it works fine.



Best



Paolo



Paolo Platter
Agile Lab CTO

Da: Michael Armbrust mich...@databricks.com
Inviato: lunedì 1 settembre 2014 19:43
A: arthur.hk.c...@gmail.com
Cc: user@spark.apache.org
Oggetto: Re: Spark and Shark

I don't believe that Shark works with Spark  1.0.  Have you considered trying 
Spark SQL?


On Mon, Sep 1, 2014 at 8:21 AM, 
arthur.hk.c...@gmail.commailto:arthur.hk.c...@gmail.com 
arthur.hk.c...@gmail.commailto:arthur.hk.c...@gmail.com wrote:
Hi,

I have installed Spark 1.0.2 and Shark 0.9.2 on Hadoop 2.4.1 (by compiling from 
source).

spark: 1.0.2
shark: 0.9.2
hadoop: 2.4.1
java: java version 1.7.0_67
protobuf: 2.5.0


I have tried the smoke test in shark but got  
java.util.NoSuchElementException error,  can you please advise how to fix 
this?

shark create table x1 (a INT);
FAILED: Hive Internal Error: java.util.NoSuchElementException(null)
14/09/01 23:04:24 [main]: ERROR shark.SharkDriver: FAILED: Hive Internal Error: 
java.util.NoSuchElementException(null)
java.util.NoSuchElementException
at java.util.HashMap$HashIterator.nextEntry(HashMap.java:925)
at java.util.HashMap$ValueIterator.next(HashMap.java:950)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:8117)
at 
shark.parse.SharkSemanticAnalyzer.analyzeInternal(SharkSemanticAnalyzer.scala:150)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:284)
at shark.SharkDriver.compile(SharkDriver.scala:215)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:342)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:977)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888)
at shark.SharkCliDriver.processCmd(SharkCliDriver.scala:340)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
at shark.SharkCliDriver$.main(SharkCliDriver.scala:237)
at shark.SharkCliDriver.main(SharkCliDriver.scala)


spark-env.sh
#!/usr/bin/env bash
export CLASSPATH=$HBASE_HOME/lib/hadoop-snappy-0.0.1-SNAPSHOT.jar
export CLASSPATH=$CLASSPATH:$HIVE_HOME/lib/mysql-connector-java-5.1.31-bin.jar
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native/Linux-amd64-64
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/hadoop}
export 
SPARK_CLASSPATH=$SPARK_HOME/lib_managed/jars/mysql-connector-java-5.1.31-bin.jar
export SPARK_WORKER_MEMORY=2g
export HADOOP_HEAPSIZE=2000

spark-defaults.conf
spark.executor.memory   2048m
spark.shuffle.spill.compressfalse

shark-env.sh
#!/usr/bin/env bash
export SPARK_MEM=2g
export SHARK_MASTER_MEM=2g
SPARK_JAVA_OPTS= -Dspark.local.dir=/tmp 
SPARK_JAVA_OPTS+=-Dspark.kryoserializer.buffer.mb=10 
SPARK_JAVA_OPTS+=-verbose:gc -XX:-PrintGCDetails -XX:+PrintGCTimeStamps 
export SPARK_JAVA_OPTS
export SHARK_EXEC_MODE=yarn
export 
SPARK_ASSEMBLY_JAR=$SCALA_HOME/assembly/target/scala-2.10/spark-assembly-1.0.2-hadoop2.4.1.jar
export SHARK_ASSEMBLY_JAR=target/scala-2.10/shark_2.10-0.9.2.jar
export HIVE_CONF_DIR=$HIVE_HOME/conf
export SPARK_LIBPATH=$HADOOP_HOME/lib/native/
export SPARK_LIBRARY_PATH=$HADOOP_HOME/lib/native/
export 
SPARK_CLASSPATH=$SHARK_HOME/lib/hadoop-snappy-0.0.1-SNAPSHOT.jar:$SHARK_HOME/lib/protobuf-java-2.5.0.jar


Regards
Arthur




Re: Time series forecasting

2014-09-01 Thread filipus
i guess it is not a question of spark but a question on your dataset you need
to Setup

think about what you wonna model and how you can shape the data in such a
way spark can use it

akima is a technique i know

a_{t+1} = C1 * a_{t} + C2* a_{t-1} + ... + C6 * a_{t-5}

spark can finde the cofficients C1-C6 by regregression I guess 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Time-series-forecasting-tp13236p13239.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Spark 1.0.2 Can GroupByTest example be run in Eclipse without change

2014-09-01 Thread Shing Hing Man
Hi, 

I have noticed that the GroupByTest example in
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/GroupByTest.scala
has been changed to  be run using spark-submit. 
Previously,  I set local as the first command line parameter, and this enable 
me to run GroupByTest in Eclipse. 
val sc = new SparkContext(args(0), GroupBy Test,
System.getenv(SPARK_HOME), SparkContext.jarOfClass(this.getClass).toSeq)


In the latest GroupByTest code, I can not pass in local as the first comand 
line parameter : 
val sparkConf = new SparkConf().setAppName(GroupBy Test)
var numMappers = if (args.length  0) args(0).toInt else 2
var numKVPairs = if (args.length  1) args(1).toInt else 1000
var valSize = if (args.length  2) args(2).toInt else 1000
var numReducers = if (args.length  3) args(3).toInt else numMappers
val sc = new SparkContext(sparkConf)


Is there a way to specify  master=local (maybe in an environment variable), 
so that I can run the latest 
version of GroupByTest in Eclipse without changing the code. 

Thanks in advance for your assistance !

Shing 

Re: Can value in spark-defaults.conf support system variables?

2014-09-01 Thread Andrew Or
No, not currently.


2014-09-01 2:53 GMT-07:00 Zhanfeng Huo huozhanf...@gmail.com:

 Hi,all:

 Can value in spark-defaults.conf support system variables?

 Such as mess = ${user.home}/${user.name}.

 Best Regards

 --
 Zhanfeng Huo



zip equal-length but unequally-partition

2014-09-01 Thread Kevin Jung
http://www.adamcrume.com/blog/archive/2014/02/19/fixing-sparks-rdd-zip
http://www.adamcrume.com/blog/archive/2014/02/19/fixing-sparks-rdd-zip  

Please check this url .
I got same problem in v1.0.1
In some cases, RDD losts several elements after zip so that a total count of
ZippedRDD is less than source RDD.
will 1.1 version of Spark fix it?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/zip-equal-length-but-unequally-partition-tp13246.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org