subscribe

2016-01-08 Thread Jeetendra Gangele

Re: automatic start of streaming job on failure on YARN

2015-10-03 Thread Jeetendra Gangele
yes in yarn cluster mode. On 2 October 2015 at 22:10, Ashish Rangole <arang...@gmail.com> wrote: > Are you running the job in yarn cluster mode? > On Oct 1, 2015 6:30 AM, "Jeetendra Gangele" <gangele...@gmail.com> wrote: > >> We've a streaming applicat

automatic start of streaming job on failure on YARN

2015-10-01 Thread Jeetendra Gangele
We've a streaming application running on yarn and we would like to ensure that is up running 24/7. Is there a way to tell yarn to automatically restart a specific application on failure? There is property yarn.resourcemanager.am.max-attempts which is default set to 2 setting it to bigger value

Re: Deploying spark-streaming application on production

2015-10-01 Thread Jeetendra Gangele
ld like to >>> make it reliable. >>> >>> Basically either MQTT supports prsistence (which I don't know) or there >>> is Kafka for these use case. >>> >>> Another option would be I think to place observable streams in between >>> MQTT an

Deploying spark-streaming application on production

2015-09-21 Thread Jeetendra Gangele
Hi All, I have an spark streaming application with batch (10 ms) which is reading the MQTT channel and dumping the data from MQTT to HDFS. So suppose if I have to deploy new application jar(with changes in spark streaming application) what is the best way to deploy, currently I am doing as below

bad substitution for [hdp.version] Error in spark on YARN job

2015-09-09 Thread Jeetendra Gangele
Hi , I am getting below error when running the spark job on YARN with HDP cluster. I have installed spark and yarn from Ambari and I am using spark 1.3.1 with HDP version HDP-2.3.0.0-2557. My spark-default.conf has correct entry spark.driver.extraJavaOptions -Dhdp.version=2.3.0.0-2557

Re: bad substitution for [hdp.version] Error in spark on YARN job

2015-09-09 Thread Jeetendra Gangele
Finally it did worked out solved it modifying the mapred-site.xml removed the entry for application yarn master(from this property removed the HDP version things). On 9 September 2015 at 17:44, Jeetendra Gangele <gangele...@gmail.com> wrote: > Hi , > I am getting below error

Re: Sending yarn application logs to web socket

2015-09-08 Thread Jeetendra Gangele
t; countless hours due to typos in the file, for example. > > On Mon, Sep 7, 2015 at 11:47 AM, Jeetendra Gangele <gangele...@gmail.com> > wrote: > >> I also tried placing my costomized log4j.properties file under >> src/main/resources still no luck. >>

Sending yarn application logs to web socket

2015-09-07 Thread Jeetendra Gangele
Hi All I have been trying to send my application related logs to socket so that we can write log stash and check the application logs. here is my log4j.property file main.logger=RFA,SA log4j.appender.SA=org.apache.log4j.net.SocketAppender log4j.appender.SA.Port=4560

Re: Sending yarn application logs to web socket

2015-09-07 Thread Jeetendra Gangele
I also tried placing my costomized log4j.properties file under src/main/resources still no luck. won't above step modify the default YARN and spark log4j.properties ? anyhow its still taking log4j.properties from YARn. On 7 September 2015 at 19:25, Jeetendra Gangele <gangele...@gmail.

Re: Sending yarn application logs to web socket

2015-09-07 Thread Jeetendra Gangele
anybody here to help? On 7 September 2015 at 17:53, Jeetendra Gangele <gangele...@gmail.com> wrote: > Hi All I have been trying to send my application related logs to socket so > that we can write log stash and check the application logs. > > here is my log4j.property file &

Re: Loading already existing tables in spark shell

2015-08-25 Thread Jeetendra Gangele
. Please set the database by using use database command before executing the query. Regards, Ishwardeep -- *From:* Jeetendra Gangele gangele...@gmail.com *Sent:* Monday, August 24, 2015 5:47 PM *To:* user *Subject:* Loading already existing tables in spark shell Hi

Re: spark not launching in yarn-cluster mode

2015-08-25 Thread Jeetendra Gangele
run on your local machine rather than container of YARN cluster. 2015-08-25 16:19 GMT+08:00 Jeetendra Gangele gangele...@gmail.com: Hi All i am trying to launch the spark shell with --master yarn-cluster its giving below error. why this is not supported? bin/spark-sql --master yarn-cluster

creating data warehouse with Spark and running query with Hive

2015-08-19 Thread Jeetendra Gangele
HI All, I have a data in HDFS partition with Year/month/data/event_type. And I am creating a hive tables with this data, this data is in JSON so I am using json serve and creating hive tables. below is the code val jsonFile =

Re: Data from PostgreSQL to Spark

2015-08-03 Thread Jeetendra Gangele
? Or some jdbc proxy? Le mar. 28 juil. 2015 à 19:34, Jeetendra Gangele gangele...@gmail.com a écrit : can the source write to Kafka/Flume/Hbase in addition to Postgres? no it can't write ,this is due to the fact that there are many applications those are producing this postGreSql data.I can't

Re: Spark on YARN

2015-07-30 Thread Jeetendra Gangele
it. If you met similar problem, you could increase this configuration “yarn.nodemanager.vmem-pmem-ratio”. Thanks Jerry *From:* Jeff Zhang [mailto:zjf...@gmail.com] *Sent:* Thursday, July 30, 2015 4:36 PM *To:* Jeetendra Gangele *Cc:* user *Subject:* Re: Spark on YARN 15/07/30 12:13:35

Re: Spark on YARN

2015-07-30 Thread Jeetendra Gangele
I can't see the application logs here. All the logs are going into stderr. can anybody help here? On 30 July 2015 at 12:21, Jeetendra Gangele gangele...@gmail.com wrote: I am running below command this is default spark PI program but this is not running all the log are going in stderr

Spark on YARN

2015-07-30 Thread Jeetendra Gangele
I am running below command this is default spark PI program but this is not running all the log are going in stderr but at the terminal job is succeeding .I guess there are con issue job it not at all launching /bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster

Re: Data from PostgreSQL to Spark

2015-07-28 Thread Jeetendra Gangele
: You can call dB connect once per partition. Please have a look at design patterns of for each construct in document. How big is your data in dB? How soon that data changes? You would be better off if data is in spark already On 28 Jul 2015 04:48, Jeetendra Gangele gangele...@gmail.com wrote

Re: Data from PostgreSQL to Spark

2015-07-28 Thread Jeetendra Gangele
...@gmail.com wrote: Why cant you bulk pre-fetch the data to HDFS (like using Sqoop) instead of hitting Postgres multiple times? Sent from Windows Mail *From:* ayan guha guha.a...@gmail.com *Sent:* ‎Monday‎, ‎July‎ ‎27‎, ‎2015 ‎4‎:‎41‎ ‎PM *To:* Jeetendra Gangele gangele...@gmail.com *Cc

Re: Data from PostgreSQL to Spark

2015-07-28 Thread Jeetendra Gangele
a trigger in Postgres to send data to the big data cluster as soon as changes are made. Or as I was saying in another email, can the source write to Kafka/Flume/Hbase in addition to Postgres? Sent from Windows Mail *From:* Jeetendra Gangele gangele...@gmail.com *Sent:* ‎Tuesday‎, ‎July‎ ‎28‎

Data from PostgreSQL to Spark

2015-07-27 Thread Jeetendra Gangele
Hi All I have a use case where where I am consuming the Events from RabbitMQ using spark streaming.This event has some fields on which I want to query the PostgreSQL and bring the data and then do the join between event data and PostgreSQl data and put the aggregated data into HDFS, so that I run

Re: Data from PostgreSQL to Spark

2015-07-27 Thread Jeetendra Gangele
...@hotmail.com wrote: You can have Spark reading from PostgreSQL through the data access API. Do you have any concern with that approach since you mention copying that data into HBase. From: Jeetendra Gangele Sent: Monday, July 27, 6:00 AM Subject: Data from PostgreSQL to Spark To: user

getting Error while Running SparkPi program

2015-07-24 Thread Jeetendra Gangele
while running below getting the error un yarn log can anybody hit this issue ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster lib/spark-examples-1.4.1-hadoop2.6.0.jar 10 2015-07-24 12:06:10,846 ERROR [RMCommunicator Allocator]

Re: Re: Need help in setting up spark cluster

2015-07-23 Thread Jeetendra Gangele
to deploy a spark standalone cluster to run some integration tests, and also you can consider running spark on yarn for the later development use cases. Best, Sun. -- fightf...@163.com *From:* Jeetendra Gangele gangele...@gmail.com *Date:* 2015-07-23 13:39

Need help in SparkSQL

2015-07-22 Thread Jeetendra Gangele
HI All, I have data in MongoDb(few TBs) which I want to migrate to HDFS to do complex queries analysis on this data.Queries like AND queries involved multiple fields So my question in which which format I should store the data in HDFS so that processing will be fast for such kind of queries?

Re: Need help in setting up spark cluster

2015-07-22 Thread Jeetendra Gangele
Can anybody help here? On 22 July 2015 at 10:38, Jeetendra Gangele gangele...@gmail.com wrote: Hi All, I am trying to capture the user activities for real estate portal. I am using RabbitMS and Spark streaming combination where all the Events I am pushing to RabbitMQ and then 1 secs micro

Re: Need help in SparkSQL

2015-07-22 Thread Jeetendra Gangele
jornfra...@gmail.com wrote: Can you provide an example of an and query ? If you do just look-up you should try Hbase/ phoenix, otherwise you can try orc with storage index and/or compression, but this depends on how your queries look like Le mer. 22 juil. 2015 à 14:48, Jeetendra Gangele gangele

Does Spark streaming support is there with RabbitMQ

2015-07-20 Thread Jeetendra Gangele
Does Apache spark support RabbitMQ. I have messages on RabbitMQ and I want to process them using Apache Spark streaming does it scale? Regards Jeetendra

Re: Does Spark streaming support is there with RabbitMQ

2015-07-20 Thread Jeetendra Gangele
/Stratio/RabbitMQ-Receiver The source is here: https://github.com/Stratio/RabbitMQ-Receiver Not sure that meets your needs or not. -Todd On Mon, Jul 20, 2015 at 8:52 AM, Jeetendra Gangele gangele...@gmail.com wrote: Does Apache spark support RabbitMQ. I have messages on RabbitMQ and I want

Re: Create multiple rows from elements in array on a single row

2015-06-08 Thread Jeetendra Gangele
mapTopair that time you can break the key. On 8 June 2015 at 23:27, Bill Q bill.q@gmail.com wrote: Hi, I have a rdd with the following structure: row1: key: Seq[a, b]; value: value 1 row2: key: seq[a, c, f]; value: value 2 Is there an efficient way to de-flat the rows into? row1: key:

Re: Error in using saveAsParquetFile

2015-06-08 Thread Jeetendra Gangele
Parquet file when are you loading these file? can you please share the code where you are passing parquet file to spark?. On 8 June 2015 at 16:39, Cheng Lian lian.cs@gmail.com wrote: Are you appending the joined DataFrame whose PolicyType is string to an existing Parquet file whose

Re: path to hdfs

2015-06-08 Thread Jeetendra Gangele
your HDFS path to spark job is incorrect. On 8 June 2015 at 16:24, Nirmal Fernando nir...@wso2.com wrote: HDFS path should be something like; hdfs:// 127.0.0.1:8020/user/cloudera/inputs/ On Mon, Jun 8, 2015 at 4:15 PM, Pa Rö paul.roewer1...@googlemail.com wrote: hello, i submit my spark

not getting any mail

2015-05-02 Thread Jeetendra Gangele
Hi All I am not getting any mail from this community?

Re: Enabling Event Log

2015-05-02 Thread Jeetendra Gangele
is it working now? On 1 May 2015 at 13:43, James King jakwebin...@gmail.com wrote: Oops! well spotted. Many thanks Shixiong. On Fri, May 1, 2015 at 1:25 AM, Shixiong Zhu zsxw...@gmail.com wrote: spark.history.fs.logDirectory is for the history server. For Spark applications, they should

Re: MLib KMeans on large dataset issues

2015-04-29 Thread Jeetendra Gangele
How you are passing feature vector to K means? its in 2-D space of 1-D array? Did you try using Streaming Kmeans? will you be able to paste code here? On 29 April 2015 at 17:23, Sam Stoelinga sammiest...@gmail.com wrote: Hi Sparkers, I am trying to run MLib kmeans on a large dataset(50+Gb

solr in spark

2015-04-28 Thread Jeetendra Gangele
Does anyone tried using solr inside spark? below is the project describing it. https://github.com/LucidWorks/spark-solr. I have a requirement in which I want to index 20 millions companies name and then search as and when new data comes in. the output should be list of companies matching the

Re: solr in spark

2015-04-28 Thread Jeetendra Gangele
-search.html On Tue, Apr 28, 2015 at 6:27 PM, Jeetendra Gangele gangele...@gmail.com wrote: Does anyone tried using solr inside spark? below is the project describing it. https://github.com/LucidWorks/spark-solr. I have a requirement in which I want to index 20 millions companies name

Re: directory loader in windows

2015-04-25 Thread Jeetendra Gangele
loc = D:\\Project\\Spark\\code\\news\\jsonfeeds\\ On 25 April 2015 at 20:49, Jeetendra Gangele gangele...@gmail.com wrote: Hi Ayan can you try below line loc = D:\\Project\\Spark\\code\\news\\jsonfeeds On 25 April 2015 at 20:08, ayan guha guha.a...@gmail.com wrote: Hi I am facing

Re: directory loader in windows

2015-04-25 Thread Jeetendra Gangele
Hi Ayan can you try below line loc = D:\\Project\\Spark\\code\\news\\jsonfeeds On 25 April 2015 at 20:08, ayan guha guha.a...@gmail.com wrote: Hi I am facing this weird issue. I am on Windows, and I am trying to load all files within a folder. Here is my code - loc =

Re: directory loader in windows

2015-04-25 Thread Jeetendra Gangele
extra forward slash at the end. sometime I have seen this kind of issues On 25 April 2015 at 20:50, Jeetendra Gangele gangele...@gmail.com wrote: loc = D:\\Project\\Spark\\code\\news\\jsonfeeds\\ On 25 April 2015 at 20:49, Jeetendra Gangele gangele...@gmail.com wrote: Hi Ayan can you try

Re: directory loader in windows

2015-04-25 Thread Jeetendra Gangele
also if this code is in scala why not val in newsY? is this define above? loc = D:\\Project\\Spark\\code\\news\\jsonfeeds newsY = sc.textFile(loc) print newsY.count() On 25 April 2015 at 20:08, ayan guha guha.a...@gmail.com wrote: Hi I am facing this weird issue. I am on Windows, and I

Re: Does HadoopRDD.zipWithIndex method preserve the order of the input data from Hadoop?

2015-04-24 Thread Jeetendra Gangele
zipwithIndex will preserve the order whatever is there in your val lines. I am not sure about the val lines=sc.textFile(hdfs://mytextFile) if this line maintain the order, next will maintain for sure On 24 April 2015 at 18:35, Spico Florin spicoflo...@gmail.com wrote: Hello! I know that

Re: Does HadoopRDD.zipWithIndex method preserve the order of the input data from Hadoop?

2015-04-24 Thread Jeetendra Gangele
Thanks that's why I was worried and tested my application again :). On 24 April 2015 at 23:22, Michal Michalski michal.michal...@boxever.com wrote: Yes. Kind regards, Michał Michalski, michal.michal...@boxever.com On 24 April 2015 at 17:12, Jeetendra Gangele gangele...@gmail.com wrote

Re: regarding ZipWithIndex

2015-04-24 Thread Jeetendra Gangele
Anyone who can guide me how to reduce the Size from Long to Int since I dont need Long index. I am huge data and this index talking 8 bytes, if i can reduce it to 4 bytes will be great help? On 22 April 2015 at 22:46, Jeetendra Gangele gangele...@gmail.com wrote: Sure thanks. if you can guide

Re: Does HadoopRDD.zipWithIndex method preserve the order of the input data from Hadoop?

2015-04-24 Thread Jeetendra Gangele
you used ZipWithUniqueID? On 24 April 2015 at 21:28, Michal Michalski michal.michal...@boxever.com wrote: I somehow missed zipWithIndex (and Sean's email), thanks for hint. I mean - I saw it before, but I just thought it's not doing what I want. I've re-read the description now and it looks

Re: Does HadoopRDD.zipWithIndex method preserve the order of the input data from Hadoop?

2015-04-24 Thread Jeetendra Gangele
I have an RDDObject which I get from Hbase scan using newAPIHadoopRDD. I am running here ZipWithIndex and its preserving the order. first object got 1 second got 2 third got 3 and so on nth object got n. On 24 April 2015 at 20:56, Ganelin, Ilya ilya.gane...@capitalone.com wrote: To maintain

Re: Tasks run only on one machine

2015-04-23 Thread Jeetendra Gangele
Will you be able to paste code here? On 23 April 2015 at 22:21, Pat Ferrel p...@occamsmachete.com wrote: Using Spark streaming to create a large volume of small nano-batch input files, ~4k per file, thousands of 'part-x' files. When reading the nano-batch files and doing a distributed

Re: Distinct is very slow

2015-04-23 Thread Jeetendra Gangele
Anyone any thought on this? On 22 April 2015 at 22:49, Jeetendra Gangele gangele...@gmail.com wrote: I made 7000 tasks in mapTopair and in distinct also I made same number of tasks. Still lots of shuffle read and write is happening due to application running for much longer time. Any idea

Streaming Kmeans usage in java

2015-04-23 Thread Jeetendra Gangele
Do everyone do we have sample example how to use streaming k-means clustering with java. I have seen some example usage in scala. can anybody point me to the java example? regards jeetendra

Re: Clustering algorithms in Spark

2015-04-22 Thread Jeetendra Gangele
does anybody have any thought on this? On 21 April 2015 at 20:57, Jeetendra Gangele gangele...@gmail.com wrote: The problem with k means is we have to define the no of cluster which I dont want in this case So thinking for something like hierarchical clustering any idea and suggestions

Re: regarding ZipWithIndex

2015-04-22 Thread Jeetendra Gangele
, Jeetendra Gangele gangele...@gmail.com wrote: Can you please guide me how can I extend RDD and convert into this way you are suggesting. On 16 April 2015 at 23:46, Jeetendra Gangele gangele...@gmail.com wrote: I type T i already have Object ... I have RDDObject and then I am calling ZipWithIndex

Re: ElasticSearch for Spark times out

2015-04-22 Thread Jeetendra Gangele
will you be able to paste the code? On 23 April 2015 at 00:19, Adrian Mocanu amoc...@verticalscope.com wrote: Hi I use the ElasticSearch package for Spark and very often it times out reading data from ES into an RDD. How can I keep the connection alive (why doesn't it? Bug?) Here's

Re: ElasticSearch for Spark times out

2015-04-22 Thread Jeetendra Gangele
Basically ready timeout means hat no data arrived within the specified receive timeout period. Few thing I would suggest 1.are your ES cluster Up and running? 2. if 1 is yes then reduce the size of the Index make it few kbps and then test? On 23 April 2015 at 00:19, Adrian Mocanu

Re: Clustering algorithms in Spark

2015-04-21 Thread Jeetendra Gangele
The problem with k means is we have to define the no of cluster which I dont want in this case So thinking for something like hierarchical clustering any idea and suggestions? On 21 April 2015 at 20:51, Jeetendra Gangele gangele...@gmail.com wrote: I have a requirement in which I want

Spark SQL vs map reduce tableInputOutput

2015-04-20 Thread Jeetendra Gangele
HI All, I am Querying Hbase and combining result and using in my spake job. I am querying hbase using Hbase client api inside my spark job. can anybody suggest me will Spark SQl will be fast enough and provide range of queries? Regards Jeetendra

Re: Shuffle files not cleaned up (Spark 1.2.1)

2015-04-20 Thread Jeetendra Gangele
Write a crone job for this like below 12 * * * * find $SPARK_HOME/work -cmin +1440 -prune -exec rm -rf {} \+ 32 * * * * find /tmp -type d -cmin +1440 -name spark-*-*-* -prune -exec rm -rf {} \+ 52 * * * * find $SPARK_LOCAL_DIR -mindepth 1 -maxdepth 1 -type d -cmin +1440 -name spark-*-*-*

Re: Spark SQL vs map reduce tableInputOutput

2015-04-20 Thread Jeetendra Gangele
range scan capability against hbase. Cheers On Apr 20, 2015, at 7:54 AM, Jeetendra Gangele gangele...@gmail.com wrote: HI All, I am Querying Hbase and combining result and using in my spake job. I am querying hbase using Hbase client api inside my spark job. can anybody suggest

Re: Custom partioner

2015-04-17 Thread Jeetendra Gangele
is shuffling anyway. Unless your raw data is such that the same key is on same node, you'll have to shuffle atleast once to make same key on same node. On Thu, Apr 16, 2015 at 10:16 PM, Jeetendra Gangele gangele...@gmail.com wrote: Hi All I have a RDD which has 1 million keys and each key is repeated

Re: Distinct is very slow

2015-04-17 Thread Jeetendra Gangele
I am saying to partition something like partitionBy(new HashPartitioner(16) will this not work? On 17 April 2015 at 21:28, Jeetendra Gangele gangele...@gmail.com wrote: I have given 3000 task to mapToPair now its taking so much memory and shuffling and wasting time there. Here is the stats

Need Costom RDD

2015-04-17 Thread Jeetendra Gangele
Hi All I have an RDDOjbect then I convert it to RDDObject,Long with ZipWithIndex here Index is Long and its taking 8 bytes Is there any way to make it Integer? There is no API available which INT index. How Can I create Custom RDD so that I takes only 4 bytes for index part? Also why API is

Custom partioner

2015-04-16 Thread Jeetendra Gangele
Hi All I have a RDD which has 1 million keys and each key is repeated from around 7000 values so total there will be around 1M*7K records in RDD. and each key is created from ZipWithIndex so key start from 0 to M-1 the problem with ZipWithIndex is it take long for key which is 8 bytes. can I

Distinct is very slow

2015-04-16 Thread Jeetendra Gangele
Hi All I have below code whether distinct is running for more time. blockingRdd is the combination of Long,String and it will have 400K records JavaPairRDDLong,Integer completeDataToprocess=blockingRdd.flatMapValues( new FunctionString, IterableInteger(){ @Override public IterableInteger

Re: Distinct is very slow

2015-04-16 Thread Jeetendra Gangele
Regards On Thu, Apr 16, 2015 at 9:56 PM, Jeetendra Gangele gangele...@gmail.com wrote: Hi All I have below code whether distinct is running for more time. blockingRdd is the combination of Long,String and it will have 400K records JavaPairRDDLong,Integer completeDataToprocess

Re: Distinct is very slow

2015-04-16 Thread Jeetendra Gangele
paste your complete code? Did you try repartioning/increasing level of parallelism to speed up the processing. Since you have 16 cores, and I'm assuming your 400k records isn't bigger than a 10G dataset. Thanks Best Regards On Thu, Apr 16, 2015 at 10:00 PM, Jeetendra Gangele gangele...@gmail.com

Re: Distinct is very slow

2015-04-16 Thread Jeetendra Gangele
Akhil, any thought on this? On 16 April 2015 at 23:07, Jeetendra Gangele gangele...@gmail.com wrote: No I did not tried the partitioning below is the full code public static void matchAndMerge(JavaRDDVendorRecord matchRdd,JavaSparkContext jsc) throws IOException{ long start

Re: How to join RDD keyValuePairs efficiently

2015-04-16 Thread Jeetendra Gangele
Does this same functionality exist with Java? On 17 April 2015 at 02:23, Evo Eftimov evo.efti...@isecc.com wrote: You can use def partitionBy(partitioner: Partitioner): RDD[(K, V)] Return a copy of the RDD partitioned using the specified partitioner The

Re: regarding ZipWithIndex

2015-04-16 Thread Jeetendra Gangele
Can you please guide me how can I extend RDD and convert into this way you are suggesting. On 16 April 2015 at 23:46, Jeetendra Gangele gangele...@gmail.com wrote: I type T i already have Object ... I have RDDObject and then I am calling ZipWithIndex on this RDD and getting RDDObject,Long

Re: regarding ZipWithIndex

2015-04-16 Thread Jeetendra Gangele
something like JavaPairRDDObject, long The long component of the pair fits your description of index. What other requirement does ZipWithIndex not provide you ? Cheers On Sun, Apr 12, 2015 at 1:16 PM, Jeetendra Gangele gangele...@gmail.com wrote: Hi All I have an RDD JavaRDDObject and I want

Re: regarding ZipWithIndex

2015-04-16 Thread Jeetendra Gangele
yuzhih...@gmail.com wrote: The Long in RDD[(T, Long)] is type parameter. You can create RDD with Integer as the first type parameter. Cheers On Thu, Apr 16, 2015 at 11:07 AM, Jeetendra Gangele gangele...@gmail.com wrote: Hi Ted. This works for me. But since Long takes here 8 bytes. Can I

Re: Distinct is very slow

2015-04-16 Thread Jeetendra Gangele
at distinct level I will have 7000 times more elements in my RDD.So should I re partition? because its parent will definitely have less partition how to see through java code number of partition? On 16 April 2015 at 23:07, Jeetendra Gangele gangele...@gmail.com wrote: No I did not tried

Execption while using kryo with broadcast

2015-04-15 Thread Jeetendra Gangele
Hi All I am getting below exception while using Kyro serializable with broadcast variable. I am broadcating a hasmap with below line. MapLong, MatcherReleventData matchData =RddForMarch.collectAsMap(); final BroadcastMapLong, MatcherReleventData dataMatchGlobal = jsc.broadcast(matchData);

Re: Execption while using kryo with broadcast

2015-04-15 Thread Jeetendra Gangele
Yes Without Kryo it did work out.when I remove kryo registration it did worked out On 15 April 2015 at 19:24, Jeetendra Gangele gangele...@gmail.com wrote: its not working with the combination of Broadcast. Without Kyro also not working. On 15 April 2015 at 19:20, Akhil Das ak

Re: Execption while using kryo with broadcast

2015-04-15 Thread Jeetendra Gangele
, Jeetendra Gangele gangele...@gmail.com wrote: Yes Without Kryo it did work out.when I remove kryo registration it did worked out On 15 April 2015 at 19:24, Jeetendra Gangele gangele...@gmail.com wrote: its not working with the combination of Broadcast. Without Kyro also not working. On 15

Re: Execption while using kryo with broadcast

2015-04-15 Thread Jeetendra Gangele
its not working with the combination of Broadcast. Without Kyro also not working. On 15 April 2015 at 19:20, Akhil Das ak...@sigmoidanalytics.com wrote: Is it working without kryo? Thanks Best Regards On Wed, Apr 15, 2015 at 6:38 PM, Jeetendra Gangele gangele...@gmail.com wrote: Hi All

Re: Execption while using kryo with broadcast

2015-04-15 Thread Jeetendra Gangele
it work w/ java serialization in the end? Or is this kryo only? * which Spark version you are using? (one of the relevant bugs was fixed in 1.2.1 and 1.3.0) On Wed, Apr 15, 2015 at 9:06 AM, Jeetendra Gangele gangele...@gmail.com wrote: This looks like known issue? check this out http

exception during foreach run

2015-04-15 Thread Jeetendra Gangele
Hi All I am getting below exception while running foreach after zipwithindex ,flatMapvalue,flatmapvalues, Insideview foreach I m doing lookup in broadcast variable java.util.concurrent.RejectedExecutionException: Worker has already been shutdown at

Re: regarding ZipWithIndex

2015-04-13 Thread Jeetendra Gangele
: bq. will return something like JavaPairRDDObject, long The long component of the pair fits your description of index. What other requirement does ZipWithIndex not provide you ? Cheers On Sun, Apr 12, 2015 at 1:16 PM, Jeetendra Gangele gangele...@gmail.com wrote: Hi All I have an RDD

Help in transforming the RDD

2015-04-13 Thread Jeetendra Gangele
Hi All I have an JavaPairRDDLong,String where each long key have 4 string values associated with it. I want to fire the Hbase query for look up of the each String part of RDD. This look-up will give result of around 7K integers.so for each key I will have 7k values. now my input RDD always

Re: function to convert to pair

2015-04-12 Thread Jeetendra Gangele
take a look at zipWithIndex() of RDD. Cheers On Wed, Apr 8, 2015 at 3:40 PM, Jeetendra Gangele gangele...@gmail.com wrote: Hi All I have a RDDSomeObject I want to convert it to RDDsequenceNumber,SomeObject this sequence number can be 1 for first SomeObject 2 for second SomeOjejct Regards

regarding ZipWithIndex

2015-04-12 Thread Jeetendra Gangele
Hi All I have an RDD JavaRDDObject and I want to convert it to JavaPairRDDIndex,Object.. Index should be unique and it should maintain the order. For first object It should have 1 and then for second 2 like that. I tried using ZipWithIndex but it will return something like JavaPairRDDObject, long

Taks going into NODE_LOCAL at beginning of job

2015-04-11 Thread Jeetendra Gangele
I have 3 transformation and then I am running for each job is going Process is going in NODE_LOCAL level and no executor in waiting for long time no task is running. Regarding Jeetendra

foreach going in infinite loop

2015-04-10 Thread Jeetendra Gangele
Hi All I am running below code before calling foreach i did 3 transformation using MapTopair. In my application there are 16 executed but no executed running anything. rddWithscore.foreach(new VoidFunctionTuple2VendorRecord,MapInteger,Double() { @Override public void call(Tuple2VendorRecord,

Need subscription process

2015-04-08 Thread Jeetendra Gangele
Hi All how can I subscribe myself in this group so that every mail sent to this group comes to me as well. I already sent request to user-subscr...@spark.apache.org ,still Iam not getting mail sent to this group by other persons. Regards Jeetendra

Regarding GroupBy

2015-04-08 Thread Jeetendra Gangele
I wanted to run the groupBy(partition ) but this is not working. here first part in pairvendorData will be repeated multiple second part. Both are object do I need to overrite the equals and hash code? Is groupBy fast enough? JavaPairRDDVendorRecord, VendorRecord pairvendorData

function to convert to pair

2015-04-08 Thread Jeetendra Gangele
Hi All I have a RDDSomeObject I want to convert it to RDDsequenceNumber,SomeObject this sequence number can be 1 for first SomeObject 2 for second SomeOjejct Regards jeet

Re: task not serialize

2015-04-07 Thread Jeetendra Gangele
Lets say I follow below approach and I got RddPair with huge size .. which can not fit into one machine ... what to run foreach on this RDD? On 7 April 2015 at 04:25, Jeetendra Gangele gangele...@gmail.com wrote: On 7 April 2015 at 04:03, Dean Wampler deanwamp...@gmail.com wrote: On Mon

Re: task not serialize

2015-04-07 Thread Jeetendra Gangele
://typesafe.com @deanwampler http://twitter.com/deanwampler http://polyglotprogramming.com On Tue, Apr 7, 2015 at 3:50 AM, Jeetendra Gangele gangele...@gmail.com wrote: Lets say I follow below approach and I got RddPair with huge size .. which can not fit into one machine ... what to run foreach

FlatMapPair run for longer time

2015-04-07 Thread Jeetendra Gangele
Hi All I am running the below code and its running for very long time where input to flatMapTopair is record of 50K. and I am calling Hbase for 50K times just a range scan query to should not take time. can anybody guide me what is wrong here? JavaPairRDDVendorRecord, IterableVendorRecord

task not serialize

2015-04-06 Thread Jeetendra Gangele
In this code in foreach I am getting task not serialized exception @SuppressWarnings(serial) public static void matchAndMerge(JavaRDDVendorRecord matchRdd, final JavaSparkContext jsc) throws IOException{ log.info(Company matcher started); //final JavaSparkContext jsc = getSparkContext();

Re: java.io.NotSerializableException: org.apache.hadoop.hbase.client.Result

2015-04-06 Thread Jeetendra Gangele
way to convert the thing into bytes. On Tue, Mar 31, 2015 at 8:51 PM, Jeetendra Gangele gangele...@gmail.com wrote: When I am trying to get the result from Hbase and running mapToPair function of RRD its giving the error java.io.NotSerializableException

Re: task not serialize

2015-04-06 Thread Jeetendra Gangele
(O'Reilly) Typesafe http://typesafe.com @deanwampler http://twitter.com/deanwampler http://polyglotprogramming.com On Mon, Apr 6, 2015 at 4:30 PM, Jeetendra Gangele gangele...@gmail.com wrote: In this code in foreach I am getting task not serialized exception @SuppressWarnings(serial) public

Re: task not serialize

2015-04-06 Thread Jeetendra Gangele
On 7 April 2015 at 04:03, Dean Wampler deanwamp...@gmail.com wrote: On Mon, Apr 6, 2015 at 6:20 PM, Jeetendra Gangele gangele...@gmail.com wrote: Thanks a lot.That means Spark does not support the nested RDD? if I pass the javaSparkContext that also wont work. I mean passing SparkContext

Re: newAPIHadoopRDD Mutiple scan result return from Hbase

2015-04-05 Thread Jeetendra Gangele
[] stopRow) { Cheers On Sun, Apr 5, 2015 at 2:35 PM, Jeetendra Gangele gangele...@gmail.com wrote: I have 2GB hbase table where this data is store in the form on key and value(only one column per key) and key also unique What I thinking to load the complete hbase table into RDD and then do

Re: newAPIHadoopRDD Mutiple scan result return from Hbase

2015-04-05 Thread Jeetendra Gangele
and firing query using native client) ? Thanks On Sun, Apr 5, 2015 at 2:00 PM, Jeetendra Gangele gangele...@gmail.com wrote: Thats true I checked the MultiRowRangeFilter and its serving my need. do I need to apply the patch? for this since I am using 0.96 hbase version. Also I have checked when

Diff between foreach and foreachsync

2015-04-05 Thread Jeetendra Gangele
Hi can somebody explain me what is the difference between foreach and foreachsync over RDD action. which one will give good result maximum throughput. does foreach run in parallel way?

Re: conversion from java collection type to scala JavaRDDObject

2015-04-04 Thread Jeetendra Gangele
://shop.oreilly.com/product/0636920033073.do (O'Reilly) Typesafe http://typesafe.com @deanwampler http://twitter.com/deanwampler http://polyglotprogramming.com On Thu, Apr 2, 2015 at 11:33 AM, Jeetendra Gangele gangele...@gmail.com wrote: Hi All Is there an way to make the JavaRDDObject from existing

Re: newAPIHadoopRDD Mutiple scan result return from Hbase

2015-04-04 Thread Jeetendra Gangele
(false); scan.setCaching(10); scan.setBatch(1000); scan.setSmall(false); conf.set(TableInputFormat.SCAN, DatabaseUtils.convertScanToString(scan)); return conf; On 4 April 2015 at 20:54, Jeetendra Gangele gangele...@gmail.com wrote: Hi All, Can we get the result of the multiple scan from

newAPIHadoopRDD Mutiple scan result return from Hbase

2015-04-04 Thread Jeetendra Gangele
Hi All, Can we get the result of the multiple scan from JavaSparkContext.newAPIHadoopRDD from Hbase. This method first parameter take configuration object where I have added filter. but how Can I query multiple scan from same table calling this API only once? regards jeetendra

Regarding MLLIB sparse and dense matrix

2015-04-03 Thread Jeetendra Gangele
Hi All I am building a logistic regression for matching the person data lets say two person object is given with their attribute we need to find the score. that means at side you have 10 millions records and other side we have 1 record , we need to tell which one match with highest score among 1

  1   2   >