[Stream] Checkpointing | chmod: cannot access `/cygdrive/d/tmp/spark/f8e594bf-d940-41cb-ab0e-0fd3710696cb/rdd-57/.part-00001-attempt-215': No such file or directory
On my local (windows) dev environment, I have been trying to get spark streaming running to test my real time(ish) jobs. I have set the checkpoint directory as /tmp/spark and have installed latest cygwin. I keep getting the following error: org.apache.hadoop.util.Shell$ExitCodeException: chmod: cannot access `/cygdrive/d/tmp/spark/f8e594bf-d940-41cb-ab0e-0fd3710696cb/rdd-57/.part-1-attempt-215': No such file or directory Although nothing breaks but such errors are a bit annoying. Any clues on how to fix the issue?
Re: [Stream] Checkpointing | chmod: cannot access `/cygdrive/d/tmp/spark/f8e594bf-d940-41cb-ab0e-0fd3710696cb/rdd-57/.part-00001-attempt-215': No such file or directory
Hi everyone It turns out that I had chef installed and it's chmod has higher preference than cygwin's chmod in the PATH. I fixed the environment variable and now its working fine. On 1 September 2014 11:48, Aniket Bhatnagar aniket.bhatna...@gmail.com wrote: On my local (windows) dev environment, I have been trying to get spark streaming running to test my real time(ish) jobs. I have set the checkpoint directory as /tmp/spark and have installed latest cygwin. I keep getting the following error: org.apache.hadoop.util.Shell$ExitCodeException: chmod: cannot access `/cygdrive/d/tmp/spark/f8e594bf-d940-41cb-ab0e-0fd3710696cb/rdd-57/.part-1-attempt-215': No such file or directory Although nothing breaks but such errors are a bit annoying. Any clues on how to fix the issue?
operations on replicated RDD
Hi, An RDD replicated by an application is owned by only that application. No other applications can share it. Then, what is motive behind providing the rdd replication feature. What all oparations can be performed on the replicated RDD. Thank you!!! -karthik
Spark driver application can not connect to Spark-Master
Hi, I'm developing an application with Spark. My java application trying to creates spark context like Creating spark context public SparkContext createSparkContext(){ String execUri = System.getenv(SPARK_EXECUTOR_URI); String[] jars = SparkILoop.getAddedJars(); SparkConf conf = new SparkConf().setMaster(getMaster()) .setAppName(App name).setJars(jars) .set(spark.repl.class.uri, interpreter.intp().classServer().uri()); if (execUri != null) { conf.set(spark.executor.uri, execUri); } if (System.getenv(SPARK_HOME) != null) { conf.setSparkHome(System.getenv(SPARK_HOME)); } SparkContext sparkContext = new SparkContext(conf); return sparkContext; } public String getMaster() { String envMaster = System.getenv().get(MASTER); if(envMaster!=null) return envMaster; String propMaster = System.getProperty(spark.master); if(propMaster!=null) return propMaster; return local[*]; } But when i call createSparkContext(), in driver side, i got logs like -- My application's log - INFO [2014-09-01 17:28:37,092] ({pool-1-thread-2} Logging.scala[logInfo]:58) - Changing view acls to: root INFO [2014-09-01 17:28:37,092] ({pool-1-thread-2} Logging.scala[logInfo]:58) - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root) INFO [2014-09-01 17:28:37,093] ({pool-1-thread-2} Logging.scala[logInfo]:58) - Starting HTTP Server INFO [2014-09-01 17:28:37,096] ({pool-1-thread-2} Server.java[doStart]:272) - jetty-8.1.14.v20131031 INFO [2014-09-01 17:28:37,099] ({pool-1-thread-2} AbstractConnector.java[doStart]:338) - Started SocketConnector@0.0.0.0:46610 INFO [2014-09-01 17:28:40,050] ({pool-1-thread-2} Logging.scala[logInfo]:58) - Changing view acls to: root INFO [2014-09-01 17:28:40,050] ({pool-1-thread-2} Logging.scala[logInfo]:58) - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root) INFO [2014-09-01 17:28:40,589] ({spark-akka.actor.default-dispatcher-2} Slf4jLogger.scala[applyOrElse]:80) - Slf4jLogger started INFO [2014-09-01 17:28:40,626] ({spark-akka.actor.default-dispatcher-2} Slf4jLogger.scala[apply$mcV$sp]:74) - Starting remoting INFO [2014-09-01 17:28:40,833] ({spark-akka.actor.default-dispatcher-3} Slf4jLogger.scala[apply$mcV$sp]:74) - Remoting started; listening on addresses :[akka.tcp://spark@222.122.122.122:46833] INFO [2014-09-01 17:28:40,835] ({spark-akka.actor.default-dispatcher-4} Slf4jLogger.scala[apply$mcV$sp]:74) - Remoting now listens on addresses: [akka.tcp://spark@222.122.122.122:46833] INFO [2014-09-01 17:28:40,858] ({pool-1-thread-2} Logging.scala[logInfo]:58) - Registering MapOutputTracker INFO [2014-09-01 17:28:40,861] ({pool-1-thread-2} Logging.scala[logInfo]:58) - Registering BlockManagerMaster INFO [2014-09-01 17:28:40,877] ({pool-1-thread-2} Logging.scala[logInfo]:58) - Created local directory at /tmp/spark-local-20140901172840-baf4 INFO [2014-09-01 17:28:40,881] ({pool-1-thread-2} Logging.scala[logInfo]:58) - MemoryStore started with capacity 546.3 MB. INFO [2014-09-01 17:28:40,912] ({pool-1-thread-2} Logging.scala[logInfo]:58) - Bound socket to port 42671 with id = ConnectionManagerId(222.122.122.122,42671) INFO [2014-09-01 17:28:40,917] ({pool-1-thread-2} Logging.scala[logInfo]:58) - Trying to register BlockManager INFO [2014-09-01 17:28:40,920] ({spark-akka.actor.default-dispatcher-4} Logging.scala[logInfo]:58) - Registering block manager 222.122.122.122:42671 with 546.3 MB RAM INFO [2014-09-01 17:28:40,921] ({pool-1-thread-2} Logging.scala[logInfo]:58) - Registered BlockManager INFO [2014-09-01 17:28:40,932] ({pool-1-thread-2} Logging.scala[logInfo]:58) - Starting HTTP Server INFO [2014-09-01 17:28:40,933] ({pool-1-thread-2} Server.java[doStart]:272) - jetty-8.1.14.v20131031 INFO [2014-09-01 17:28:40,935] ({pool-1-thread-2} AbstractConnector.java[doStart]:338) - Started SocketConnector@0.0.0.0:52020 INFO [2014-09-01 17:28:40,936] ({pool-1-thread-2} Logging.scala[logInfo]:58) - Broadcast server started at http://222.122.122.122:52020 INFO [2014-09-01 17:28:40,943] ({pool-1-thread-2} Logging.scala[logInfo]:58) - HTTP File server directory is /tmp/spark-fc4cc226-c740-4cec-ad0f-6f88762d365c INFO [2014-09-01 17:28:40,943] ({pool-1-thread-2} Logging.scala[logInfo]:58) - Starting HTTP Server INFO [2014-09-01 17:28:40,944] ({pool-1-thread-2} Server.java[doStart]:272) - jetty-8.1.14.v20131031 INFO [2014-09-01 17:28:40,946] ({pool-1-thread-2} AbstractConnector.java[doStart]:338) - Started SocketConnector@0.0.0.0:59458 INFO [2014-09-01 17:28:41,167] ({pool-1-thread-2} Server.java[doStart]:272) - jetty-8.1.14.v20131031 INFO [2014-09-01 17:28:41,177] ({pool-1-thread-2} AbstractConnector.java[doStart]:338) - Started SelectChannelConnector@0.0.0.0:4040 INFO [2014-09-01 17:28:41,180] ({pool-1-thread-2} Logging.scala[logInfo]:58) - Started SparkUI at http://222.122.122.122:4040 INFO [2014-09-01 17:28:41,410]
Can value in spark-defaults.conf support system variables?
Hi,all: Can value in spark-defaults.conf support system variables? Such as mess = ${user.home}/${user.name}. Best Regards Zhanfeng Huo
Has anybody faced SPARK-2604 issue regarding Application hang state
Hi, Has anyone else also experienced https://issues.apache.org/jira/browse/SPARK-2604? It is an edge case scenario of mis configuration, where the executor memory asked is same as the maximum allowed memory by yarn. In such situation, application stays in hang state, and the reason is not logged in verbose manner to be debugged easily. As per the fix, it gets detected and corresponding reasons are logged before failing the application. I will prefer the fix to be in open source code version, please share your thoughts. Thanks,
Value of SHUFFLE_PARTITIONS
Hi, Currently the number of shuffle partitions is config driven parameter (SHUFFLE_PARTITIONS) . This means that anyone who is running a spark-sql query should first of all analyze that what value of SHUFFLE_PARTITIONS would give the best performance for the query. Shouldn't there be a logic in SparkSql which should be able to figure out the best value and also provide a mechanism to give preference to user specified value. This I believe can be worked out on the basis of number of partitions in the original data. I ran some queries and with default value (200) of shuffle-partitioning, and when I changed this value to 5, the time taken by the query reduced by nearly 35%. Thanks, Chirag
[Streaming] Triggering an action in absence of data
Hi all I am struggling to implement a use case wherein I need to trigger an action in case no data has been received for X amount of time. I haven't been able to figure out an easy way to do this. No state/foreach methods get called when no data has arrived. I thought of generating a 'tick' DStream that generates an arbitrary object and union/group the tick stream with data stream to detect that data hasn't arrived for X amount of time. However, since my data DStream is Paired (has key-value tuple) and I use updateStateByKey method for processing the data stream, I can't group/union it with tick stream(s) without knowing all keys in advance. My second idea was to push data from DStream to an actor and let actor (per key) manage state and data absent use cases. However, there is no way to run an actor continuously for all data belonging to a key or a partition. I am stuck now and can't think of anything else to solve for the use case. Has anyone else ran into similar issue? Any thoughts on how the use case could be implemented in Spark streaming? Thanks, Aniket
Re: Problem Accessing Hive Table from hiveContext
Hello Igor, Although Decimal is supported, Hive 0.12 does not support user definable precision and scale (it was introduced in Hive 0.13). Thanks, Yin On Sat, Aug 30, 2014 at 1:50 AM, Zitser, Igor igor.zit...@citi.com wrote: Hi All, New to spark and using Spark 1.0.2 and hive 0.12. If hive table created as test_datatypes(testbigint bigint, ss bigint ) select * from test_datatypes from spark works fine. For create table test_datatypes(testbigint bigint, testdec decimal(5,2) ) scala val dataTypes=hiveContext.hql(select * from test_datatypes) 14/08/28 21:18:44 INFO parse.ParseDriver: Parsing command: select * from test_datatypes 14/08/28 21:18:44 INFO parse.ParseDriver: Parse Completed 14/08/28 21:18:44 INFO analysis.Analyzer: Max iterations (2) reached for batch MultiInstanceRelations 14/08/28 21:18:44 INFO analysis.Analyzer: Max iterations (2) reached for batch CaseInsensitiveAttributeReferences java.lang.IllegalArgumentException: Error: ',', ':', or ';' expected at position 14 from 'bigint:decimal(5,2)' [0:bigint, 6::, 7:decimal, 14:(, 15:5, 16:,, 17:2, 18:)] at org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseTypeInfos(TypeInfoUtils.java:312) at org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.getTypeInfosFromTypeString(TypeInfoUtils.java:716) at org.apache.hadoop.hive.serde2.lazy.LazyUtils.extractColumnInfo(LazyUtils.java:364) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.initSerdeParams(LazySimpleSerDe.java:288) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.initialize(LazySimpleSerDe.java:187) at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:218) at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:272) at org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:175) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:991) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:924) at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:58) at org.apache.spark.sql.hive.HiveContext$$anon$2.org $apache$spark$sql$catalyst$analysis$OverrideCatalog$$super$lookupRelation(HiveContext.scala:143) at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$$anonfun$lookupRelation$3.apply(Catalog.scala:122) at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$$anonfun$lookupRelation$3.apply(Catalog.scala:122) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$class.lookupRelation(Catalog.scala:122) at org.apache.spark.sql.hive.HiveContext$$anon$2.lookupRelation(HiveContext.scala:149) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$2.applyOrElse(Analyzer.scala:83) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$2.applyOrElse(Analyzer.scala:81) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:183) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) at scala.collection.TraversableOnce$class.to (TraversableOnce.scala:273) at scala.collection.AbstractIterator.to(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:212) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:168) at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:156) Same exception happens using table as create table test_datatypes(testbigint bigint, testdate date ) . Thanks, Igor. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: how to filter value in spark
you could join, it'll give you the intersection and a list of the labels where the value was found. a.join(b).collect Array[(String, (String, String))] = Array((4,(a,b)), (3,(a,b))) best, matt On 08/31/2014 09:23 PM, Liu, Raymond wrote: You could use cogroup to combine RDDs in one RDD for cross reference processing. e.g. a.cogroup(b). filter{case (_, (l,r)) = l.nonEmpty r.nonEmpty }. map{case (k,(l,r)) = (k, l)} Best Regards, Raymond Liu -Original Message- From: marylucy [mailto:qaz163wsx_...@hotmail.com] Sent: Friday, August 29, 2014 9:26 PM To: Matthew Farrellee Cc: user@spark.apache.org Subject: Re: how to filter value in spark i see it works well,thank you!!! But in follow situation how to do var a = sc.textFile(/sparktest/1/).map((_,a)) var b = sc.textFile(/sparktest/2/).map((_,b)) How to get (3,a) and (4,a) 在 Aug 28, 2014,19:54,Matthew Farrellee m...@redhat.com 写道: On 08/28/2014 07:20 AM, marylucy wrote: fileA=1 2 3 4 one number a line,save in /sparktest/1/ fileB=3 4 5 6 one number a line,save in /sparktest/2/ I want to get 3 and 4 var a = sc.textFile(/sparktest/1/).map((_,1)) var b = sc.textFile(/sparktest/2/).map((_,1)) a.filter(param={b.lookup(param._1).length0}).map(_._1).foreach(prin tln) Error throw Scala.MatchError:Null PairRDDFunctions.lookup... the issue is nesting of the b rdd inside a transformation of the a rdd consider using intersection, it's more idiomatic a.intersection(b).foreach(println) but not that intersection will remove duplicates best, matt - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org B�CB??[��X�剀�X�KK[XZ[ ?\�\�][��X�剀�X�P?\���\X?KBY][��[圹[X[??K[XZ[ ?\�\�Z[?\���\X?KB�B - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: transforming a Map object to RDD
and in python, map = {'a': 1, 'b': 2, 'c': 3} rdd = sc.parallelize(map.items()) rdd.collect() [('a', 1), ('c', 3), ('b', 2)] best, matt On 08/28/2014 07:01 PM, Sean Owen wrote: val map = Map(foo - 1, bar - 2, baz - 3) val rdd = sc.parallelize(map.toSeq) rdd is a an RDD[(String,Int)] and you can do what you like from there. On Thu, Aug 28, 2014 at 11:56 PM, SK skrishna...@gmail.com wrote: Hi, How do I convert a Map object to an RDD so that I can use the saveAsTextFile() operation to output the Map object? thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/transforming-a-Map-object-to-RDD-tp13071.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Spark and Shark
Hi, I have installed Spark 1.0.2 and Shark 0.9.2 on Hadoop 2.4.1 (by compiling from source). spark: 1.0.2 shark: 0.9.2 hadoop: 2.4.1 java: java version “1.7.0_67” protobuf: 2.5.0 I have tried the smoke test in shark but got “java.util.NoSuchElementException” error, can you please advise how to fix this? shark create table x1 (a INT); FAILED: Hive Internal Error: java.util.NoSuchElementException(null) 14/09/01 23:04:24 [main]: ERROR shark.SharkDriver: FAILED: Hive Internal Error: java.util.NoSuchElementException(null) java.util.NoSuchElementException at java.util.HashMap$HashIterator.nextEntry(HashMap.java:925) at java.util.HashMap$ValueIterator.next(HashMap.java:950) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:8117) at shark.parse.SharkSemanticAnalyzer.analyzeInternal(SharkSemanticAnalyzer.scala:150) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:284) at shark.SharkDriver.compile(SharkDriver.scala:215) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:342) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:977) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888) at shark.SharkCliDriver.processCmd(SharkCliDriver.scala:340) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423) at shark.SharkCliDriver$.main(SharkCliDriver.scala:237) at shark.SharkCliDriver.main(SharkCliDriver.scala) spark-env.sh #!/usr/bin/env bash export CLASSPATH=$HBASE_HOME/lib/hadoop-snappy-0.0.1-SNAPSHOT.jar export CLASSPATH=$CLASSPATH:$HIVE_HOME/lib/mysql-connector-java-5.1.31-bin.jar export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native/Linux-amd64-64 export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/hadoop} export SPARK_CLASSPATH=$SPARK_HOME/lib_managed/jars/mysql-connector-java-5.1.31-bin.jar export SPARK_WORKER_MEMORY=2g export HADOOP_HEAPSIZE=2000 spark-defaults.conf spark.executor.memory 2048m spark.shuffle.spill.compressfalse shark-env.sh #!/usr/bin/env bash export SPARK_MEM=2g export SHARK_MASTER_MEM=2g SPARK_JAVA_OPTS= -Dspark.local.dir=/tmp SPARK_JAVA_OPTS+=-Dspark.kryoserializer.buffer.mb=10 SPARK_JAVA_OPTS+=-verbose:gc -XX:-PrintGCDetails -XX:+PrintGCTimeStamps export SPARK_JAVA_OPTS export SHARK_EXEC_MODE=yarn export SPARK_ASSEMBLY_JAR=$SCALA_HOME/assembly/target/scala-2.10/spark-assembly-1.0.2-hadoop2.4.1.jar export SHARK_ASSEMBLY_JAR=target/scala-2.10/shark_2.10-0.9.2.jar export HIVE_CONF_DIR=$HIVE_HOME/conf export SPARK_LIBPATH=$HADOOP_HOME/lib/native/ export SPARK_LIBRARY_PATH=$HADOOP_HOME/lib/native/ export SPARK_CLASSPATH=$SHARK_HOME/lib/hadoop-snappy-0.0.1-SNAPSHOT.jar:$SHARK_HOME/lib/protobuf-java-2.5.0.jar Regards Arthur
Re: Spark and Shark
I don't believe that Shark works with Spark 1.0. Have you considered trying Spark SQL? On Mon, Sep 1, 2014 at 8:21 AM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, I have installed Spark 1.0.2 and Shark 0.9.2 on Hadoop 2.4.1 (by compiling from source). spark: 1.0.2 shark: 0.9.2 hadoop: 2.4.1 java: java version “1.7.0_67” protobuf: 2.5.0 I have tried the smoke test in shark but got “java.util.NoSuchElementException” error, can you please advise how to fix this? shark create table x1 (a INT); FAILED: Hive Internal Error: java.util.NoSuchElementException(null) 14/09/01 23:04:24 [main]: ERROR shark.SharkDriver: FAILED: Hive Internal Error: java.util.NoSuchElementException(null) java.util.NoSuchElementException at java.util.HashMap$HashIterator.nextEntry(HashMap.java:925) at java.util.HashMap$ValueIterator.next(HashMap.java:950) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:8117) at shark.parse.SharkSemanticAnalyzer.analyzeInternal(SharkSemanticAnalyzer.scala:150) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:284) at shark.SharkDriver.compile(SharkDriver.scala:215) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:342) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:977) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888) at shark.SharkCliDriver.processCmd(SharkCliDriver.scala:340) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423) at shark.SharkCliDriver$.main(SharkCliDriver.scala:237) at shark.SharkCliDriver.main(SharkCliDriver.scala) spark-env.sh #!/usr/bin/env bash export CLASSPATH=$HBASE_HOME/lib/hadoop-snappy-0.0.1-SNAPSHOT.jar export CLASSPATH=$CLASSPATH:$HIVE_HOME/lib/mysql-connector-java-5.1.31-bin.jar export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native/Linux-amd64-64 export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/hadoop} export SPARK_CLASSPATH=$SPARK_HOME/lib_managed/jars/mysql-connector-java-5.1.31-bin.jar export SPARK_WORKER_MEMORY=2g export HADOOP_HEAPSIZE=2000 spark-defaults.conf spark.executor.memory 2048m spark.shuffle.spill.compressfalse shark-env.sh #!/usr/bin/env bash export SPARK_MEM=2g export SHARK_MASTER_MEM=2g SPARK_JAVA_OPTS= -Dspark.local.dir=/tmp SPARK_JAVA_OPTS+=-Dspark.kryoserializer.buffer.mb=10 SPARK_JAVA_OPTS+=-verbose:gc -XX:-PrintGCDetails -XX:+PrintGCTimeStamps export SPARK_JAVA_OPTS export SHARK_EXEC_MODE=yarn export SPARK_ASSEMBLY_JAR=$SCALA_HOME/assembly/target/scala-2.10/spark-assembly-1.0.2-hadoop2.4.1.jar export SHARK_ASSEMBLY_JAR=target/scala-2.10/shark_2.10-0.9.2.jar export HIVE_CONF_DIR=$HIVE_HOME/conf export SPARK_LIBPATH=$HADOOP_HOME/lib/native/ export SPARK_LIBRARY_PATH=$HADOOP_HOME/lib/native/ export SPARK_CLASSPATH=$SHARK_HOME/lib/hadoop-snappy-0.0.1-SNAPSHOT.jar:$SHARK_HOME/lib/protobuf-java-2.5.0.jar Regards Arthur
RE: Spark and Shark
We tried to connect the old Simba Shark ODBC driver to the Thrift JDBC Server with Spark 1.1 RC2 and it works fine. Best Paolo Paolo Platter Agile Lab CTO Da: Michael Armbrust mich...@databricks.com Inviato: lunedì 1 settembre 2014 19:43 A: arthur.hk.c...@gmail.com Cc: user@spark.apache.org Oggetto: Re: Spark and Shark I don't believe that Shark works with Spark 1.0. Have you considered trying Spark SQL? On Mon, Sep 1, 2014 at 8:21 AM, arthur.hk.c...@gmail.commailto:arthur.hk.c...@gmail.com arthur.hk.c...@gmail.commailto:arthur.hk.c...@gmail.com wrote: Hi, I have installed Spark 1.0.2 and Shark 0.9.2 on Hadoop 2.4.1 (by compiling from source). spark: 1.0.2 shark: 0.9.2 hadoop: 2.4.1 java: java version 1.7.0_67 protobuf: 2.5.0 I have tried the smoke test in shark but got java.util.NoSuchElementException error, can you please advise how to fix this? shark create table x1 (a INT); FAILED: Hive Internal Error: java.util.NoSuchElementException(null) 14/09/01 23:04:24 [main]: ERROR shark.SharkDriver: FAILED: Hive Internal Error: java.util.NoSuchElementException(null) java.util.NoSuchElementException at java.util.HashMap$HashIterator.nextEntry(HashMap.java:925) at java.util.HashMap$ValueIterator.next(HashMap.java:950) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:8117) at shark.parse.SharkSemanticAnalyzer.analyzeInternal(SharkSemanticAnalyzer.scala:150) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:284) at shark.SharkDriver.compile(SharkDriver.scala:215) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:342) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:977) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888) at shark.SharkCliDriver.processCmd(SharkCliDriver.scala:340) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423) at shark.SharkCliDriver$.main(SharkCliDriver.scala:237) at shark.SharkCliDriver.main(SharkCliDriver.scala) spark-env.sh #!/usr/bin/env bash export CLASSPATH=$HBASE_HOME/lib/hadoop-snappy-0.0.1-SNAPSHOT.jar export CLASSPATH=$CLASSPATH:$HIVE_HOME/lib/mysql-connector-java-5.1.31-bin.jar export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native/Linux-amd64-64 export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/hadoop} export SPARK_CLASSPATH=$SPARK_HOME/lib_managed/jars/mysql-connector-java-5.1.31-bin.jar export SPARK_WORKER_MEMORY=2g export HADOOP_HEAPSIZE=2000 spark-defaults.conf spark.executor.memory 2048m spark.shuffle.spill.compressfalse shark-env.sh #!/usr/bin/env bash export SPARK_MEM=2g export SHARK_MASTER_MEM=2g SPARK_JAVA_OPTS= -Dspark.local.dir=/tmp SPARK_JAVA_OPTS+=-Dspark.kryoserializer.buffer.mb=10 SPARK_JAVA_OPTS+=-verbose:gc -XX:-PrintGCDetails -XX:+PrintGCTimeStamps export SPARK_JAVA_OPTS export SHARK_EXEC_MODE=yarn export SPARK_ASSEMBLY_JAR=$SCALA_HOME/assembly/target/scala-2.10/spark-assembly-1.0.2-hadoop2.4.1.jar export SHARK_ASSEMBLY_JAR=target/scala-2.10/shark_2.10-0.9.2.jar export HIVE_CONF_DIR=$HIVE_HOME/conf export SPARK_LIBPATH=$HADOOP_HOME/lib/native/ export SPARK_LIBRARY_PATH=$HADOOP_HOME/lib/native/ export SPARK_CLASSPATH=$SHARK_HOME/lib/hadoop-snappy-0.0.1-SNAPSHOT.jar:$SHARK_HOME/lib/protobuf-java-2.5.0.jar Regards Arthur
Re: Time series forecasting
i guess it is not a question of spark but a question on your dataset you need to Setup think about what you wonna model and how you can shape the data in such a way spark can use it akima is a technique i know a_{t+1} = C1 * a_{t} + C2* a_{t-1} + ... + C6 * a_{t-5} spark can finde the cofficients C1-C6 by regregression I guess -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Time-series-forecasting-tp13236p13239.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Spark 1.0.2 Can GroupByTest example be run in Eclipse without change
Hi, I have noticed that the GroupByTest example in https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/GroupByTest.scala has been changed to be run using spark-submit. Previously, I set local as the first command line parameter, and this enable me to run GroupByTest in Eclipse. val sc = new SparkContext(args(0), GroupBy Test, System.getenv(SPARK_HOME), SparkContext.jarOfClass(this.getClass).toSeq) In the latest GroupByTest code, I can not pass in local as the first comand line parameter : val sparkConf = new SparkConf().setAppName(GroupBy Test) var numMappers = if (args.length 0) args(0).toInt else 2 var numKVPairs = if (args.length 1) args(1).toInt else 1000 var valSize = if (args.length 2) args(2).toInt else 1000 var numReducers = if (args.length 3) args(3).toInt else numMappers val sc = new SparkContext(sparkConf) Is there a way to specify master=local (maybe in an environment variable), so that I can run the latest version of GroupByTest in Eclipse without changing the code. Thanks in advance for your assistance ! Shing
Re: Can value in spark-defaults.conf support system variables?
No, not currently. 2014-09-01 2:53 GMT-07:00 Zhanfeng Huo huozhanf...@gmail.com: Hi,all: Can value in spark-defaults.conf support system variables? Such as mess = ${user.home}/${user.name}. Best Regards -- Zhanfeng Huo
zip equal-length but unequally-partition
http://www.adamcrume.com/blog/archive/2014/02/19/fixing-sparks-rdd-zip http://www.adamcrume.com/blog/archive/2014/02/19/fixing-sparks-rdd-zip Please check this url . I got same problem in v1.0.1 In some cases, RDD losts several elements after zip so that a total count of ZippedRDD is less than source RDD. will 1.1 version of Spark fix it? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/zip-equal-length-but-unequally-partition-tp13246.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org