Cannot create parquet with snappy output for hive external table

2017-05-16 Thread Dhimant
Hi Group,

I am not able to load data into external hive table which is partitioned.

Trace :-

1. create external table test(id int, name string) stored as parquet
location 'hdfs://testcluster/user/abc/test' tblproperties
('PARQUET.COMPRESS'='SNAPPY');

2.Spark code

   val spark =
SparkSession.builder().enableHiveSupport().config("hive.exec.dynamic.partition",
"true")
  .config("hive.exec.dynamic.partition.mode",
"nonstrict").getOrCreate()
spark.sql("use default").show
val rdd = sc.parallelize(Seq((1, "one"), (2, "two")))
val df = spark.createDataFrame(rdd).toDF("id", "name")
df.write.mode(SaveMode.Overwrite).insertInto("test")

3. I can see few snappy.parquet files.

4. create external table test(id int) partitioned by  (name string)  stored
as parquet location 'hdfs://testcluster/user/abc/test' tblproperties
('PARQUET.COMPRESS'='SNAPPY');

5.Spark code

   val spark =
SparkSession.builder().enableHiveSupport().config("hive.exec.dynamic.partition",
"true")
  .config("hive.exec.dynamic.partition.mode",
"nonstrict").getOrCreate()
spark.sql("use default").show
val rdd = sc.parallelize(Seq((1, "one"), (2, "two")))
val df = spark.createDataFrame(rdd).toDF("id", "name")
df.write.mode(SaveMode.Overwrite).insertInto("test")

6. I see uncompressed files without snappy.parquet extension.
parquet-tools.jar also confirms that this is uncompressed parquet file.

7.i tried following options as well, but no luck

df.write.mode(SaveMode.Overwrite).format("parquet").option("compression",
"snappy").insertInto("test")


Thanks in advance.





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Cannot-create-parquet-with-snappy-output-for-hive-external-table-tp28687.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: problems with checkpoint and spark sql

2016-09-21 Thread Dhimant
Hi David,

You got any solution for this ? 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/problems-with-checkpoint-and-spark-sql-tp26080p27773.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Bad Digest error while doing aws s3 put

2016-02-06 Thread Dhimant
Hi , I am getting the following error while reading the huge data from S3 and
after processing ,writing data to S3 again.

Did you find any solution for this ?

16/02/07 07:41:59 WARN scheduler.TaskSetManager: Lost task 144.2 in stage
3.0 (TID 169, ip-172-31-7-26.us-west-2.compute.internal):
java.io.IOException: exception in uploadSinglePart
at
com.amazon.ws.emr.hadoop.fs.s3n.MultipartUploadOutputStream.uploadSinglePart(MultipartUploadOutputStream.java:248)
at
com.amazon.ws.emr.hadoop.fs.s3n.MultipartUploadOutputStream.close(MultipartUploadOutputStream.java:469)
at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
at
org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:105)
at
org.apache.hadoop.io.compress.CompressorStream.close(CompressorStream.java:106)
at java.io.FilterOutputStream.close(FilterOutputStream.java:160)
at
org.apache.hadoop.mapred.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:109)
at
org.apache.spark.SparkHadoopWriter.close(SparkHadoopWriter.scala:102)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1080)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1059)
at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: exception in putObject
at
com.amazon.ws.emr.hadoop.fs.s3n.Jets3tNativeFileSystemStore.storeFile(Jets3tNativeFileSystemStore.java:149)
at sun.reflect.GeneratedMethodAccessor26.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
at com.sun.proxy.$Proxy26.storeFile(Unknown Source)
at
com.amazon.ws.emr.hadoop.fs.s3n.MultipartUploadOutputStream.uploadSinglePart(MultipartUploadOutputStream.java:245)
... 15 more
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: The
Content-MD5 you specified did not match what we received. (Service: Amazon
S3; Status Code: 400; Error Code: BadDigest; Request ID: 5918216A5901FCC8),
S3 Extended Request ID:
QSxtYln/yXqHYpdr4BWosin/TAFsGlK1FlKfE5PcuJkNrgoblGzTNt74kEhuNcrJCRZ3mXq0oUo=
at
com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1182)
at
com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:770)
at
com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:489)
at
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:310)
at
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3796)
at
com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1482)
at
com.amazon.ws.emr.hadoop.fs.s3n.Jets3tNativeFileSystemStore.storeFile(Jets3tNativeFileSystemStore.java:140)
... 22 more





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Bad-Digest-error-while-doing-aws-s3-put-tp10036p26167.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Explanation streaming-cep-engine with example

2015-03-25 Thread Dhimant
Hi,
Can someone explain how spark streaming cep engine works ?
How to use it with sample example?

http://spark-packages.org/package/Stratio/streaming-cep-engine



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Explanation-streaming-cep-engine-with-example-tp22218.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Error while Insert data into hive table via spark

2015-03-19 Thread Dhimant
Hi,

I have configured apache spark 1.3.0 with hive 1.0.0 and hadoop 2.6.0.
I am able to create table and retrive data from hive tables via following
commands ,but not able insert data into table.

scala sqlContext.sql(CREATE TABLE IF NOT EXISTS newtable (key INT));
scala sqlContext.sql(select * from newtable).collect;
15/03/19 02:10:20 INFO parse.ParseDriver: Parsing command: select * from
newtable
15/03/19 02:10:20 INFO parse.ParseDriver: Parse Completed

15/03/19 02:10:35 INFO scheduler.DAGScheduler: Job 0 finished: collect at
SparkPlan.scala:83, took 13.826402 s
res2: Array[org.apache.spark.sql.Row] = Array([1])


But I am not able to insert data into this table via spark shell. This
command runs perfectly fine from hive shell.

scala val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
sqlContext: org.apache.spark.sql.hive.HiveContext =
org.apache.spark.sql.hive.HiveContext@294fa094
// scala sqlContext.sql(INSERT INTO TABLE newtable SELECT 1);
scala sqlContext.sql(INSERT INTO TABLE newtable values(1));
15/03/19 02:03:14 INFO metastore.HiveMetaStore: 0: Opening raw store with
implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
15/03/19 02:03:14 INFO metastore.ObjectStore: ObjectStore, initialize called
15/03/19 02:03:14 INFO DataNucleus.Persistence: Property
datanucleus.cache.level2 unknown - will be ignored
15/03/19 02:03:14 INFO DataNucleus.Persistence: Property
hive.metastore.integral.jdo.pushdown unknown - will be ignored
15/03/19 02:03:14 WARN DataNucleus.Connection: BoneCP specified but not
present in CLASSPATH (or one of dependencies)
15/03/19 02:03:15 WARN DataNucleus.Connection: BoneCP specified but not
present in CLASSPATH (or one of dependencies)
15/03/19 02:03:16 INFO metastore.ObjectStore: Setting MetaStore object pin
classes with
hive.metastore.cache.pinobjtypes=Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order
15/03/19 02:03:18 INFO DataNucleus.Datastore: The class
org.apache.hadoop.hive.metastore.model.MFieldSchema is tagged as
embedded-only so does not have its own datastore table.
15/03/19 02:03:18 INFO DataNucleus.Datastore: The class
org.apache.hadoop.hive.metastore.model.MOrder is tagged as embedded-only
so does not have its own datastore table.
15/03/19 02:03:18 INFO DataNucleus.Datastore: The class
org.apache.hadoop.hive.metastore.model.MFieldSchema is tagged as
embedded-only so does not have its own datastore table.
15/03/19 02:03:18 INFO DataNucleus.Datastore: The class
org.apache.hadoop.hive.metastore.model.MOrder is tagged as embedded-only
so does not have its own datastore table.
15/03/19 02:03:18 INFO DataNucleus.Query: Reading in results for query
org.datanucleus.store.rdbms.query.SQLQuery@0 since the connection used is
closing
15/03/19 02:03:18 INFO metastore.ObjectStore: Initialized ObjectStore
15/03/19 02:03:19 INFO metastore.HiveMetaStore: Added admin role in
metastore
15/03/19 02:03:19 INFO metastore.HiveMetaStore: Added public role in
metastore
15/03/19 02:03:19 INFO metastore.HiveMetaStore: No user is added in admin
role, since config is empty
15/03/19 02:03:20 INFO session.SessionState: No Tez session required at this
point. hive.execution.engine=mr.
15/03/19 02:03:20 INFO parse.ParseDriver: Parsing command: INSERT INTO TABLE
newtable values(1)
NoViableAltException(26@[])
at
org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:742)
at
org.apache.hadoop.hive.ql.parse.HiveParser.selectClause(HiveParser.java:40171)
at
org.apache.hadoop.hive.ql.parse.HiveParser.singleSelectStatement(HiveParser.java:38048)
at
org.apache.hadoop.hive.ql.parse.HiveParser.selectStatement(HiveParser.java:37754)
at
org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:37654)
at
org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:36898)
at
org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:36774)
at
org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1338)
at
org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1036)
at
org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199)
at
org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
at org.apache.spark.sql.hive.HiveQl$.getAst(HiveQl.scala:227)
at org.apache.spark.sql.hive.HiveQl$.createPlan(HiveQl.scala:241)
at
org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:41)
at
org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:40)
at
scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136)
at
scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:135)
at
scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)

No suitable driver found error, Create table in hive from spark sql

2015-02-18 Thread Dhimant
No suitable driver found error, Create table in hive from spark sql.

I am trying to execute following example.
SPARKGIT:
spark/examples/src/main/scala/org/apache/spark/examples/sql/hive/HiveFromSpark.scala

My setup :- hadoop 1.6,spark 1.2, hive 1.0, mysql server (installed via yum
install mysql55w mysql55w-server)

I can create tables in hive from hive command prompt.
/
hive select * from person_parquet;
OK
Barack  Obama   M
BillClinton M
Hillary Clinton F
Time taken: 1.945 seconds, Fetched: 3 row(s)
/

I am starting spark shell via following command:-

./spark-1.2.0-bin-hadoop2.4/bin/spark-shell --master
spark://sparkmaster.company.com:7077 --jars
/data/mysql-connector-java-5.1.14-bin.jar

/scala Class.forName(com.mysql.jdbc.Driver)
res0: Class[_] = class com.mysql.jdbc.Driver

scala Class.forName(com.mysql.jdbc.Driver).newInstance
res1: Any = com.mysql.jdbc.Driver@2dec8e27

scala val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
sqlContext: org.apache.spark.sql.hive.HiveContext =
org.apache.spark.sql.hive.HiveContext@32ecf100

scala sqlContext.sql(CREATE TABLE IF NOT EXISTS src (key INT, value
STRING))
15/02/18 22:23:01 INFO parse.ParseDriver: Parsing command: CREATE TABLE IF
NOT EXISTS src (key INT, value STRING)
15/02/18 22:23:02 INFO parse.ParseDriver: Parse Completed
15/02/18 22:23:02 INFO metastore.HiveMetaStore: 0: Opening raw store with
implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
15/02/18 22:23:02 INFO metastore.ObjectStore: ObjectStore, initialize called
15/02/18 22:23:02 INFO DataNucleus.Persistence: Property
datanucleus.cache.level2 unknown - will be ignored
15/02/18 22:23:02 INFO DataNucleus.Persistence: Property
hive.metastore.integral.jdo.pushdown unknown - will be ignored
15/02/18 22:23:02 WARN DataNucleus.Connection: BoneCP specified but not
present in CLASSPATH (or one of dependencies)
15/02/18 22:23:02 WARN DataNucleus.Connection: BoneCP specified but not
present in CLASSPATH (or one of dependencies)
15/02/18 22:23:02 ERROR Datastore.Schema: Failed initialising database.
No suitable driver found for jdbc:mysql://sparkmaster.company.com:3306/hive
org.datanucleus.exceptions.NucleusDataStoreException: No suitable driver
found for jdbc:mysql://sparkmaster.company.com:3306/hive
at
org.datanucleus.store.rdbms.ConnectionFactoryImpl$ManagedConnectionImpl.getConnection(ConnectionFactoryImpl.java:516)
at
org.datanucleus.store.rdbms.RDBMSStoreManager.init(RDBMSStoreManager.java:298)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at
org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
at
org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:301)
at
org.datanucleus.NucleusContext.createStoreManagerForProperties(NucleusContext.java:1187)
at
org.datanucleus.NucleusContext.initialise(NucleusContext.java:356)
at
org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:775)
at
org.datanucleus.api.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:333)
at
org.datanucleus.api.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:202)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at javax.jdo.JDOHelper$16.run(JDOHelper.java:1965)
at java.security.AccessController.doPrivileged(Native Method)
at javax.jdo.JDOHelper.invoke(JDOHelper.java:1960)
at
javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1166)
at
javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
at
javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)
at
org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:310)
at
org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:339)
at
org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:248)
at
org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:223)
at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at

Re: No suitable driver found error, Create table in hive from spark sql

2015-02-18 Thread Dhimant
Found solution from one of the post found on internet.
I updated spark/bin/compute-classpath.sh and added database connector jar
into classpath.
CLASSPATH=$CLASSPATH:/data/mysql-connector-java-5.1.14-bin.jar



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/No-suitable-driver-found-error-Create-table-in-hive-from-spark-sql-tp21714p21715.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Change number of workers and memory

2014-09-22 Thread Dhimant
I am having a spark cluster having some high performance nodes and others are
having commodity specs (lower configuration). 
When I configure worker memory and instances in spark-env.sh, it reflects to
all the nodes.
Can I change SPARK_WORKER_MEMORY and SPARK_WORKER_INSTANCES properties per
node/machine basis ?
I am using Spark 1.1.0 version.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Change-number-of-workers-and-memory-tp14866.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: error: type mismatch while Union

2014-09-08 Thread Dhimant
Thank you Aaron for pointing out problem. This only happens when I run this
code in spark-shell but not when i submit the job.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/error-type-mismatch-while-Union-tp13547p13677.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: error: type mismatch while Union

2014-09-06 Thread Dhimant
I am using Spark version 1.0.2




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/error-type-mismatch-while-Union-tp13547p13618.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



error: type mismatch while assigning RDD to RDD val object

2014-09-04 Thread Dhimant
I am receiving following error in Spark-Shell while executing following code.

 /class LogRecrod(logLine: String) extends Serializable {
val splitvals = logLine.split(,);
val strIp: String = splitvals(0)
val hostname: String = splitvals(1)
val server_name: String = splitvals(2)
  }/

/var logRecordRdd: org.apache.spark.rdd.RDD[LogRecrod] = _/

/ val sourceFile =
sc.textFile(hdfs://192.168.1.30:9000/Data/Log_1406794333258.log, 2)/
14/09/04 12:08:28 INFO storage.MemoryStore: ensureFreeSpace(179585) called
with curMem=0, maxMem=309225062
14/09/04 12:08:28 INFO storage.MemoryStore: Block broadcast_0 stored as
values to memory (estimated size 175.4 KB, free 294.7 MB)
sourceFile: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at
console:12


/scala logRecordRdd = sourceFile.map(line = new LogRecrod(line))/
/console:18: error: type mismatch;
 found   : LogRecrod
 required: LogRecrod
   logRecordRdd = sourceFile.map(line = new LogRecrod(line))/

Any suggestions to resolve this problem?




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/error-type-mismatch-while-assigning-RDD-to-RDD-val-object-tp13429.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Multiple spark shell sessions

2014-09-04 Thread Dhimant
Hi,
I am receiving following error while connecting the spark server via shell
if one shell is already open.
How can I open multiple sessions ?

Does anyone know abt Workflow Engine/Job Server like apache oozie for spark
?

/
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.0.2
  /_/

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java
1.7.0_60)
Type in expressions to have them evaluated.
Type :help for more information.
14/09/04 15:07:46 INFO spark.SecurityManager: Changing view acls to: root
14/09/04 15:07:46 INFO spark.SecurityManager: SecurityManager:
authentication disabled; ui acls disabled; users with view permissions:
Set(root)
14/09/04 15:07:46 INFO slf4j.Slf4jLogger: Slf4jLogger started
14/09/04 15:07:47 INFO Remoting: Starting remoting
14/09/04 15:07:47 INFO Remoting: Remoting started; listening on addresses
:[akka.tcp://sp...@sparkmaster.guavus.com:42236]
14/09/04 15:07:47 INFO Remoting: Remoting now listens on addresses:
[akka.tcp://sp...@sparkmaster.guavus.com:42236]
14/09/04 15:07:47 INFO spark.SparkEnv: Registering MapOutputTracker
14/09/04 15:07:47 INFO spark.SparkEnv: Registering BlockManagerMaster
14/09/04 15:07:47 INFO storage.DiskBlockManager: Created local directory at
/tmp/spark-local-20140904150747-4dcd
14/09/04 15:07:47 INFO storage.MemoryStore: MemoryStore started with
capacity 294.9 MB.
14/09/04 15:07:47 INFO network.ConnectionManager: Bound socket to port 54453
with id = ConnectionManagerId(sparkmaster.guavus.com,54453)
14/09/04 15:07:47 INFO storage.BlockManagerMaster: Trying to register
BlockManager
14/09/04 15:07:47 INFO storage.BlockManagerInfo: Registering block manager
sparkmaster.guavus.com:54453 with 294.9 MB RAM
14/09/04 15:07:47 INFO storage.BlockManagerMaster: Registered BlockManager
14/09/04 15:07:47 INFO spark.HttpServer: Starting HTTP Server
14/09/04 15:07:47 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/09/04 15:07:47 INFO server.AbstractConnector: Started
SocketConnector@0.0.0.0:48977
14/09/04 15:07:47 INFO broadcast.HttpBroadcast: Broadcast server started at
http://192.168.1.21:48977
14/09/04 15:07:47 INFO spark.HttpFileServer: HTTP File server directory is
/tmp/spark-0e45759a-2c58-439a-8e96-95b0bc1d6136
14/09/04 15:07:47 INFO spark.HttpServer: Starting HTTP Server
14/09/04 15:07:47 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/09/04 15:07:47 INFO server.AbstractConnector: Started
SocketConnector@0.0.0.0:39962
14/09/04 15:07:48 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/09/04 15:07:48 WARN component.AbstractLifeCycle: FAILED
SelectChannelConnector@0.0.0.0:4040: java.net.BindException: Address already
in use
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:444)
at sun.nio.ch.Net.bind(Net.java:436)
at
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at
org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187)
at
org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316)
at
org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265)
at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
at org.eclipse.jetty.server.Server.doStart(Server.java:293)
at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
at
org.apache.spark.ui.JettyUtils$$anonfun$1.apply$mcV$sp(JettyUtils.scala:192)
at
org.apache.spark.ui.JettyUtils$$anonfun$1.apply(JettyUtils.scala:192)
at
org.apache.spark.ui.JettyUtils$$anonfun$1.apply(JettyUtils.scala:192)
at scala.util.Try$.apply(Try.scala:161)
at org.apache.spark.ui.JettyUtils$.connect$1(JettyUtils.scala:191)
at
org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:205)
at org.apache.spark.ui.WebUI.bind(WebUI.scala:99)
at org.apache.spark.SparkContext.init(SparkContext.scala:223)
at
org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:957)
at $line3.$read$$iwC$$iwC.init(console:8)
at $line3.$read$$iwC.init(console:14)
at $line3.$read.init(console:16)
at $line3.$read$.init(console:20)
at $line3.$read$.clinit(console)
at $line3.$eval$.init(console:7)
at $line3.$eval$.clinit(console)
at $line3.$eval.$print(console)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at

Re: Multiple spark shell sessions

2014-09-04 Thread Dhimant
Thanks Yana,
I am able to execute application and command via another session, i also
received another port for UI application.

Thanks,
Dhimant



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Multiple-spark-shell-sessions-tp13441p13459.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



java.io.NotSerializableException exception - custom Accumulator

2014-04-08 Thread Dhimant Jayswal
Hi ,

I am getting java.io.NotSerializableException exception while executing
following program.

import org.apache.spark.SparkContext._
import org.apache.spark.SparkContext
import org.apache.spark.AccumulatorParam
object App {
  class Vector (val data: Array[Double]) {}
  implicit object VectorAP extends AccumulatorParam[Vector]  {
def zero(v: Vector) : Vector = new Vector(new Array(v.data.size))
def addInPlace(v1: Vector, v2: Vector) : Vector = {
  for (i - 0 to v1.data.size-1) v1.data(i) += v2.data(i)
  return v1
}
  }
  def main(sc:SparkContext) {
val vectorAcc = sc.accumulator(new Vector(Array(0, 0)))
val accum = sc.accumulator(0)
val file = sc.textFile(/user/root/data/SourceFiles/a.txt, 10)
file.foreach(line = {println(line); accum+=1; vectorAcc.add(new
Vector(Array(1,1 ))) ;})
println(accum.value)
println(vectorAcc.value.data)
println(= )
  }
}

--

scala App.main(sc)
14/04/09 01:02:05 INFO storage.MemoryStore: ensureFreeSpace(130760) called
with curMem=0, maxMem=308713881
14/04/09 01:02:05 INFO storage.MemoryStore: Block broadcast_0 stored as
values to memory (estimated size 127.7 KB, free 294.3 MB)
14/04/09 01:02:07 INFO mapred.FileInputFormat: Total input paths to process
: 1
14/04/09 01:02:07 INFO spark.SparkContext: Starting job: foreach at
console:30
14/04/09 01:02:07 INFO scheduler.DAGScheduler: Got job 0 (foreach at
console:30) with 11 output partitions (allowLocal=false)
14/04/09 01:02:07 INFO scheduler.DAGScheduler: Final stage: Stage 0
(foreach at console:30)
14/04/09 01:02:07 INFO scheduler.DAGScheduler: Parents of final stage:
List()
14/04/09 01:02:07 INFO scheduler.DAGScheduler: Missing parents: List()
14/04/09 01:02:07 INFO scheduler.DAGScheduler: Submitting Stage 0
(MappedRDD[1] at textFile at console:29), which has no missing parents
14/04/09 01:02:07 INFO scheduler.DAGScheduler: Failed to run foreach at
console:30
org.apache.spark.SparkException: Job aborted: Task not serializable:
java.io.NotSerializableException: $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$App$
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1026)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1026)
at org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:794)
at org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:737)
at
org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:569)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:207)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)