yes in yarn cluster mode.
On 2 October 2015 at 22:10, Ashish Rangole <arang...@gmail.com> wrote:
> Are you running the job in yarn cluster mode?
> On Oct 1, 2015 6:30 AM, "Jeetendra Gangele" <gangele...@gmail.com> wrote:
>
>> We've a streaming applicat
We've a streaming application running on yarn and we would like to ensure
that is up running 24/7.
Is there a way to tell yarn to automatically restart a specific application
on failure?
There is property yarn.resourcemanager.am.max-attempts which is default set
to 2 setting it to bigger value
ld like to
>>> make it reliable.
>>>
>>> Basically either MQTT supports prsistence (which I don't know) or there
>>> is Kafka for these use case.
>>>
>>> Another option would be I think to place observable streams in between
>>> MQTT an
Hi All,
I have an spark streaming application with batch (10 ms) which is reading
the MQTT channel and dumping the data from MQTT to HDFS.
So suppose if I have to deploy new application jar(with changes in spark
streaming application) what is the best way to deploy, currently I am doing
as below
Hi ,
I am getting below error when running the spark job on YARN with HDP
cluster.
I have installed spark and yarn from Ambari and I am using spark 1.3.1 with
HDP version HDP-2.3.0.0-2557.
My spark-default.conf has correct entry
spark.driver.extraJavaOptions -Dhdp.version=2.3.0.0-2557
Finally it did worked out solved it modifying the mapred-site.xml removed
the entry for application yarn master(from this property removed the HDP
version things).
On 9 September 2015 at 17:44, Jeetendra Gangele <gangele...@gmail.com>
wrote:
> Hi ,
> I am getting below error
t; countless hours due to typos in the file, for example.
>
> On Mon, Sep 7, 2015 at 11:47 AM, Jeetendra Gangele <gangele...@gmail.com>
> wrote:
>
>> I also tried placing my costomized log4j.properties file under
>> src/main/resources still no luck.
>>
Hi All I have been trying to send my application related logs to socket so
that we can write log stash and check the application logs.
here is my log4j.property file
main.logger=RFA,SA
log4j.appender.SA=org.apache.log4j.net.SocketAppender
log4j.appender.SA.Port=4560
I also tried placing my costomized log4j.properties file under
src/main/resources still no luck.
won't above step modify the default YARN and spark log4j.properties ?
anyhow its still taking log4j.properties from YARn.
On 7 September 2015 at 19:25, Jeetendra Gangele <gangele...@gmail.
anybody here to help?
On 7 September 2015 at 17:53, Jeetendra Gangele <gangele...@gmail.com>
wrote:
> Hi All I have been trying to send my application related logs to socket so
> that we can write log stash and check the application logs.
>
> here is my log4j.property file
&
. Please set the database by using use database command before
executing the query.
Regards,
Ishwardeep
--
*From:* Jeetendra Gangele gangele...@gmail.com
*Sent:* Monday, August 24, 2015 5:47 PM
*To:* user
*Subject:* Loading already existing tables in spark shell
Hi
run on your local
machine rather than container of YARN cluster.
2015-08-25 16:19 GMT+08:00 Jeetendra Gangele gangele...@gmail.com:
Hi All i am trying to launch the spark shell with --master yarn-cluster
its giving below error.
why this is not supported?
bin/spark-sql --master yarn-cluster
HI All,
I have a data in HDFS partition with Year/month/data/event_type. And I am
creating a hive tables with this data, this data is in JSON so I am using
json serve and creating hive tables.
below is the code
val jsonFile =
? Or some jdbc
proxy?
Le mar. 28 juil. 2015 à 19:34, Jeetendra Gangele gangele...@gmail.com a
écrit :
can the source write to Kafka/Flume/Hbase in addition to Postgres? no
it can't write ,this is due to the fact that there are many applications
those are producing this postGreSql data.I can't
it. If you met similar problem, you
could increase this configuration “yarn.nodemanager.vmem-pmem-ratio”.
Thanks
Jerry
*From:* Jeff Zhang [mailto:zjf...@gmail.com]
*Sent:* Thursday, July 30, 2015 4:36 PM
*To:* Jeetendra Gangele
*Cc:* user
*Subject:* Re: Spark on YARN
15/07/30 12:13:35
I can't see the application logs here. All the logs are going into stderr.
can anybody help here?
On 30 July 2015 at 12:21, Jeetendra Gangele gangele...@gmail.com wrote:
I am running below command this is default spark PI program but this is
not running all the log are going in stderr
I am running below command this is default spark PI program but this is not
running all the log are going in stderr but at the terminal job is
succeeding .I guess there are con issue job it not at all launching
/bin/spark-submit --class org.apache.spark.examples.SparkPi --master
yarn-cluster
:
You can call dB connect once per partition. Please have a look at design
patterns of for each construct in document.
How big is your data in dB? How soon that data changes? You would be
better off if data is in spark already
On 28 Jul 2015 04:48, Jeetendra Gangele gangele...@gmail.com wrote
...@gmail.com wrote:
Why cant you bulk pre-fetch the data to HDFS (like using Sqoop) instead
of hitting Postgres multiple times?
Sent from Windows Mail
*From:* ayan guha guha.a...@gmail.com
*Sent:* Monday, July 27, 2015 4:41 PM
*To:* Jeetendra Gangele gangele...@gmail.com
*Cc
a trigger in Postgres to send data to the big data
cluster as soon as changes are made. Or as I was saying in another email,
can the source write to Kafka/Flume/Hbase in addition to Postgres?
Sent from Windows Mail
*From:* Jeetendra Gangele gangele...@gmail.com
*Sent:* Tuesday, July 28
Hi All
I have a use case where where I am consuming the Events from RabbitMQ using
spark streaming.This event has some fields on which I want to query the
PostgreSQL and bring the data and then do the join between event data and
PostgreSQl data and put the aggregated data into HDFS, so that I run
...@hotmail.com wrote:
You can have Spark reading from PostgreSQL through the data access API. Do
you have any concern with that approach since you mention copying that data
into HBase.
From: Jeetendra Gangele
Sent: Monday, July 27, 6:00 AM
Subject: Data from PostgreSQL to Spark
To: user
while running below getting the error un yarn log can anybody hit this issue
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master
yarn-cluster lib/spark-examples-1.4.1-hadoop2.6.0.jar 10
2015-07-24 12:06:10,846 ERROR [RMCommunicator Allocator]
to deploy a spark standalone cluster to run some integration
tests, and also you can consider running spark on yarn for
the later development use cases.
Best,
Sun.
--
fightf...@163.com
*From:* Jeetendra Gangele gangele...@gmail.com
*Date:* 2015-07-23 13:39
HI All,
I have data in MongoDb(few TBs) which I want to migrate to HDFS to do
complex queries analysis on this data.Queries like AND queries involved
multiple fields
So my question in which which format I should store the data in HDFS so
that processing will be fast for such kind of queries?
Can anybody help here?
On 22 July 2015 at 10:38, Jeetendra Gangele gangele...@gmail.com wrote:
Hi All,
I am trying to capture the user activities for real estate portal.
I am using RabbitMS and Spark streaming combination where all the Events I
am pushing to RabbitMQ and then 1 secs micro
jornfra...@gmail.com wrote:
Can you provide an example of an and query ? If you do just look-up you
should try Hbase/ phoenix, otherwise you can try orc with storage index
and/or compression, but this depends on how your queries look like
Le mer. 22 juil. 2015 à 14:48, Jeetendra Gangele gangele
Does Apache spark support RabbitMQ. I have messages on RabbitMQ and I want
to process them using Apache Spark streaming does it scale?
Regards
Jeetendra
/Stratio/RabbitMQ-Receiver
The source is here:
https://github.com/Stratio/RabbitMQ-Receiver
Not sure that meets your needs or not.
-Todd
On Mon, Jul 20, 2015 at 8:52 AM, Jeetendra Gangele gangele...@gmail.com
wrote:
Does Apache spark support RabbitMQ. I have messages on RabbitMQ and I
want
mapTopair that time you can break the key.
On 8 June 2015 at 23:27, Bill Q bill.q@gmail.com wrote:
Hi,
I have a rdd with the following structure:
row1: key: Seq[a, b]; value: value 1
row2: key: seq[a, c, f]; value: value 2
Is there an efficient way to de-flat the rows into?
row1: key:
Parquet file when are you loading these file?
can you please share the code where you are passing parquet file to spark?.
On 8 June 2015 at 16:39, Cheng Lian lian.cs@gmail.com wrote:
Are you appending the joined DataFrame whose PolicyType is string to an
existing Parquet file whose
your HDFS path to spark job is incorrect.
On 8 June 2015 at 16:24, Nirmal Fernando nir...@wso2.com wrote:
HDFS path should be something like; hdfs://
127.0.0.1:8020/user/cloudera/inputs/
On Mon, Jun 8, 2015 at 4:15 PM, Pa Rö paul.roewer1...@googlemail.com
wrote:
hello,
i submit my spark
Hi All
I am not getting any mail from this community?
is it working now?
On 1 May 2015 at 13:43, James King jakwebin...@gmail.com wrote:
Oops! well spotted. Many thanks Shixiong.
On Fri, May 1, 2015 at 1:25 AM, Shixiong Zhu zsxw...@gmail.com wrote:
spark.history.fs.logDirectory is for the history server. For Spark
applications, they should
How you are passing feature vector to K means?
its in 2-D space of 1-D array?
Did you try using Streaming Kmeans?
will you be able to paste code here?
On 29 April 2015 at 17:23, Sam Stoelinga sammiest...@gmail.com wrote:
Hi Sparkers,
I am trying to run MLib kmeans on a large dataset(50+Gb
Does anyone tried using solr inside spark?
below is the project describing it.
https://github.com/LucidWorks/spark-solr.
I have a requirement in which I want to index 20 millions companies name
and then search as and when new data comes in. the output should be list of
companies matching the
-search.html
On Tue, Apr 28, 2015 at 6:27 PM, Jeetendra Gangele gangele...@gmail.com
wrote:
Does anyone tried using solr inside spark?
below is the project describing it.
https://github.com/LucidWorks/spark-solr.
I have a requirement in which I want to index 20 millions companies name
loc = D:\\Project\\Spark\\code\\news\\jsonfeeds\\
On 25 April 2015 at 20:49, Jeetendra Gangele gangele...@gmail.com wrote:
Hi Ayan can you try below line
loc = D:\\Project\\Spark\\code\\news\\jsonfeeds
On 25 April 2015 at 20:08, ayan guha guha.a...@gmail.com wrote:
Hi
I am facing
Hi Ayan can you try below line
loc = D:\\Project\\Spark\\code\\news\\jsonfeeds
On 25 April 2015 at 20:08, ayan guha guha.a...@gmail.com wrote:
Hi
I am facing this weird issue.
I am on Windows, and I am trying to load all files within a folder. Here
is my code -
loc =
extra forward slash at the end. sometime I have seen this kind of issues
On 25 April 2015 at 20:50, Jeetendra Gangele gangele...@gmail.com wrote:
loc = D:\\Project\\Spark\\code\\news\\jsonfeeds\\
On 25 April 2015 at 20:49, Jeetendra Gangele gangele...@gmail.com wrote:
Hi Ayan can you try
also if this code is in scala why not val in newsY? is this define above?
loc = D:\\Project\\Spark\\code\\news\\jsonfeeds
newsY = sc.textFile(loc)
print newsY.count()
On 25 April 2015 at 20:08, ayan guha guha.a...@gmail.com wrote:
Hi
I am facing this weird issue.
I am on Windows, and I
zipwithIndex will preserve the order whatever is there in your val lines.
I am not sure about the val lines=sc.textFile(hdfs://mytextFile) if
this line maintain the order, next will maintain for sure
On 24 April 2015 at 18:35, Spico Florin spicoflo...@gmail.com wrote:
Hello!
I know that
Thanks that's why I was worried and tested my application again :).
On 24 April 2015 at 23:22, Michal Michalski michal.michal...@boxever.com
wrote:
Yes.
Kind regards,
Michał Michalski,
michal.michal...@boxever.com
On 24 April 2015 at 17:12, Jeetendra Gangele gangele...@gmail.com wrote
Anyone who can guide me how to reduce the Size from Long to Int since I
dont need Long index.
I am huge data and this index talking 8 bytes, if i can reduce it to 4
bytes will be great help?
On 22 April 2015 at 22:46, Jeetendra Gangele gangele...@gmail.com wrote:
Sure thanks. if you can guide
you used ZipWithUniqueID?
On 24 April 2015 at 21:28, Michal Michalski michal.michal...@boxever.com
wrote:
I somehow missed zipWithIndex (and Sean's email), thanks for hint. I mean
- I saw it before, but I just thought it's not doing what I want. I've
re-read the description now and it looks
I have an RDDObject which I get from Hbase scan using newAPIHadoopRDD. I
am running here ZipWithIndex and its preserving the order. first object got
1 second got 2 third got 3 and so on nth object got n.
On 24 April 2015 at 20:56, Ganelin, Ilya ilya.gane...@capitalone.com
wrote:
To maintain
Will you be able to paste code here?
On 23 April 2015 at 22:21, Pat Ferrel p...@occamsmachete.com wrote:
Using Spark streaming to create a large volume of small nano-batch input
files, ~4k per file, thousands of 'part-x' files. When reading the
nano-batch files and doing a distributed
Anyone any thought on this?
On 22 April 2015 at 22:49, Jeetendra Gangele gangele...@gmail.com wrote:
I made 7000 tasks in mapTopair and in distinct also I made same number of
tasks.
Still lots of shuffle read and write is happening due to application
running for much longer time.
Any idea
Do everyone do we have sample example how to use streaming k-means
clustering with java. I have seen some example usage in scala. can anybody
point me to the java example?
regards
jeetendra
does anybody have any thought on this?
On 21 April 2015 at 20:57, Jeetendra Gangele gangele...@gmail.com wrote:
The problem with k means is we have to define the no of cluster which I
dont want in this case
So thinking for something like hierarchical clustering any idea and
suggestions
, Jeetendra Gangele gangele...@gmail.com
wrote:
Can you please guide me how can I extend RDD and convert into this way
you are suggesting.
On 16 April 2015 at 23:46, Jeetendra Gangele gangele...@gmail.com
wrote:
I type T i already have Object ... I have RDDObject and then I am
calling ZipWithIndex
will you be able to paste the code?
On 23 April 2015 at 00:19, Adrian Mocanu amoc...@verticalscope.com wrote:
Hi
I use the ElasticSearch package for Spark and very often it times out
reading data from ES into an RDD.
How can I keep the connection alive (why doesn't it? Bug?)
Here's
Basically ready timeout means hat no data arrived within the specified
receive timeout period.
Few thing I would suggest
1.are your ES cluster Up and running?
2. if 1 is yes then reduce the size of the Index make it few kbps and then
test?
On 23 April 2015 at 00:19, Adrian Mocanu
The problem with k means is we have to define the no of cluster which I
dont want in this case
So thinking for something like hierarchical clustering any idea and
suggestions?
On 21 April 2015 at 20:51, Jeetendra Gangele gangele...@gmail.com wrote:
I have a requirement in which I want
HI All,
I am Querying Hbase and combining result and using in my spake job.
I am querying hbase using Hbase client api inside my spark job.
can anybody suggest me will Spark SQl will be fast enough and provide range
of queries?
Regards
Jeetendra
Write a crone job for this like below
12 * * * * find $SPARK_HOME/work -cmin +1440 -prune -exec rm -rf {} \+
32 * * * * find /tmp -type d -cmin +1440 -name spark-*-*-* -prune -exec
rm -rf {} \+
52 * * * * find $SPARK_LOCAL_DIR -mindepth 1 -maxdepth 1 -type d -cmin
+1440 -name spark-*-*-*
range scan capability
against hbase.
Cheers
On Apr 20, 2015, at 7:54 AM, Jeetendra Gangele gangele...@gmail.com
wrote:
HI All,
I am Querying Hbase and combining result and using in my spake job.
I am querying hbase using Hbase client api inside my spark job.
can anybody suggest
is shuffling anyway. Unless your raw data is such that the same key
is on same node, you'll have to shuffle atleast once to make same key on
same node.
On Thu, Apr 16, 2015 at 10:16 PM, Jeetendra Gangele gangele...@gmail.com
wrote:
Hi All
I have a RDD which has 1 million keys and each key is repeated
I am saying to partition something like partitionBy(new HashPartitioner(16)
will this not work?
On 17 April 2015 at 21:28, Jeetendra Gangele gangele...@gmail.com wrote:
I have given 3000 task to mapToPair now its taking so much memory and
shuffling and wasting time there. Here is the stats
Hi All
I have an RDDOjbect then I convert it to RDDObject,Long with
ZipWithIndex
here Index is Long and its taking 8 bytes Is there any way to make it
Integer?
There is no API available which INT index.
How Can I create Custom RDD so that I takes only 4 bytes for index part?
Also why API is
Hi All
I have a RDD which has 1 million keys and each key is repeated from around
7000 values so total there will be around 1M*7K records in RDD.
and each key is created from ZipWithIndex so key start from 0 to M-1
the problem with ZipWithIndex is it take long for key which is 8 bytes. can
I
Hi All I have below code whether distinct is running for more time.
blockingRdd is the combination of Long,String and it will have 400K
records
JavaPairRDDLong,Integer completeDataToprocess=blockingRdd.flatMapValues(
new FunctionString, IterableInteger(){
@Override
public IterableInteger
Regards
On Thu, Apr 16, 2015 at 9:56 PM, Jeetendra Gangele gangele...@gmail.com
wrote:
Hi All I have below code whether distinct is running for more time.
blockingRdd is the combination of Long,String and it will have 400K
records
JavaPairRDDLong,Integer
completeDataToprocess
paste your complete code? Did you try repartioning/increasing
level of parallelism to speed up the processing. Since you have 16 cores,
and I'm assuming your 400k records isn't bigger than a 10G dataset.
Thanks
Best Regards
On Thu, Apr 16, 2015 at 10:00 PM, Jeetendra Gangele gangele...@gmail.com
Akhil, any thought on this?
On 16 April 2015 at 23:07, Jeetendra Gangele gangele...@gmail.com wrote:
No I did not tried the partitioning below is the full code
public static void matchAndMerge(JavaRDDVendorRecord
matchRdd,JavaSparkContext jsc) throws IOException{
long start
Does this same functionality exist with Java?
On 17 April 2015 at 02:23, Evo Eftimov evo.efti...@isecc.com wrote:
You can use
def partitionBy(partitioner: Partitioner): RDD[(K, V)]
Return a copy of the RDD partitioned using the specified partitioner
The
Can you please guide me how can I extend RDD and convert into this way you
are suggesting.
On 16 April 2015 at 23:46, Jeetendra Gangele gangele...@gmail.com wrote:
I type T i already have Object ... I have RDDObject and then I am
calling ZipWithIndex on this RDD and getting RDDObject,Long
something like JavaPairRDDObject, long
The long component of the pair fits your description of index. What other
requirement does ZipWithIndex not provide you ?
Cheers
On Sun, Apr 12, 2015 at 1:16 PM, Jeetendra Gangele gangele...@gmail.com
wrote:
Hi All I have an RDD JavaRDDObject and I want
yuzhih...@gmail.com wrote:
The Long in RDD[(T, Long)] is type parameter. You can create RDD with
Integer as the first type parameter.
Cheers
On Thu, Apr 16, 2015 at 11:07 AM, Jeetendra Gangele gangele...@gmail.com
wrote:
Hi Ted.
This works for me. But since Long takes here 8 bytes. Can I
at distinct level I will have 7000 times more elements in my RDD.So should
I re partition? because its parent will definitely have less partition how
to see through java code number of partition?
On 16 April 2015 at 23:07, Jeetendra Gangele gangele...@gmail.com wrote:
No I did not tried
Hi All I am getting below exception while using Kyro serializable with
broadcast variable. I am broadcating a hasmap with below line.
MapLong, MatcherReleventData matchData =RddForMarch.collectAsMap();
final BroadcastMapLong, MatcherReleventData dataMatchGlobal =
jsc.broadcast(matchData);
Yes Without Kryo it did work out.when I remove kryo registration it did
worked out
On 15 April 2015 at 19:24, Jeetendra Gangele gangele...@gmail.com wrote:
its not working with the combination of Broadcast.
Without Kyro also not working.
On 15 April 2015 at 19:20, Akhil Das ak
, Jeetendra Gangele gangele...@gmail.com
wrote:
Yes Without Kryo it did work out.when I remove kryo registration it did
worked out
On 15 April 2015 at 19:24, Jeetendra Gangele gangele...@gmail.com
wrote:
its not working with the combination of Broadcast.
Without Kyro also not working.
On 15
its not working with the combination of Broadcast.
Without Kyro also not working.
On 15 April 2015 at 19:20, Akhil Das ak...@sigmoidanalytics.com wrote:
Is it working without kryo?
Thanks
Best Regards
On Wed, Apr 15, 2015 at 6:38 PM, Jeetendra Gangele gangele...@gmail.com
wrote:
Hi All
it work w/ java serialization in the end? Or is this kryo only?
* which Spark version you are using? (one of the relevant bugs was fixed
in 1.2.1 and 1.3.0)
On Wed, Apr 15, 2015 at 9:06 AM, Jeetendra Gangele gangele...@gmail.com
wrote:
This looks like known issue? check this out
http
Hi All
I am getting below exception while running foreach after zipwithindex
,flatMapvalue,flatmapvalues,
Insideview foreach I m doing lookup in broadcast variable
java.util.concurrent.RejectedExecutionException: Worker has already been
shutdown
at
:
bq. will return something like JavaPairRDDObject, long
The long component of the pair fits your description of index. What other
requirement does ZipWithIndex not provide you ?
Cheers
On Sun, Apr 12, 2015 at 1:16 PM, Jeetendra Gangele gangele...@gmail.com
wrote:
Hi All I have an RDD
Hi All I have an JavaPairRDDLong,String where each long key have 4
string values associated with it. I want to fire the Hbase query for look
up of the each String part of RDD.
This look-up will give result of around 7K integers.so for each key I will
have 7k values. now my input RDD always
take a look at zipWithIndex() of RDD.
Cheers
On Wed, Apr 8, 2015 at 3:40 PM, Jeetendra Gangele gangele...@gmail.com
wrote:
Hi All I have a RDDSomeObject I want to convert it to
RDDsequenceNumber,SomeObject this sequence number can be 1 for first
SomeObject 2 for second SomeOjejct
Regards
Hi All I have an RDD JavaRDDObject and I want to convert it to
JavaPairRDDIndex,Object.. Index should be unique and it should maintain
the order. For first object It should have 1 and then for second 2 like
that.
I tried using ZipWithIndex but it will return something like
JavaPairRDDObject, long
I have 3 transformation and then I am running for each job is going
Process is going in NODE_LOCAL level and no executor in waiting for long
time
no task is running.
Regarding
Jeetendra
Hi All I am running below code before calling foreach i did 3
transformation using MapTopair. In my application there are 16 executed but
no executed running anything.
rddWithscore.foreach(new
VoidFunctionTuple2VendorRecord,MapInteger,Double() {
@Override
public void call(Tuple2VendorRecord,
Hi All how can I subscribe myself in this group so that every mail sent to
this group comes to me as well.
I already sent request to user-subscr...@spark.apache.org ,still Iam not
getting mail sent to this group by other persons.
Regards
Jeetendra
I wanted to run the groupBy(partition ) but this is not working.
here first part in pairvendorData will be repeated multiple second part.
Both are object do I need to overrite the equals and hash code?
Is groupBy fast enough?
JavaPairRDDVendorRecord, VendorRecord pairvendorData
Hi All I have a RDDSomeObject I want to convert it to
RDDsequenceNumber,SomeObject this sequence number can be 1 for first
SomeObject 2 for second SomeOjejct
Regards
jeet
Lets say I follow below approach and I got RddPair with huge size .. which
can not fit into one machine ... what to run foreach on this RDD?
On 7 April 2015 at 04:25, Jeetendra Gangele gangele...@gmail.com wrote:
On 7 April 2015 at 04:03, Dean Wampler deanwamp...@gmail.com wrote:
On Mon
://typesafe.com
@deanwampler http://twitter.com/deanwampler
http://polyglotprogramming.com
On Tue, Apr 7, 2015 at 3:50 AM, Jeetendra Gangele gangele...@gmail.com
wrote:
Lets say I follow below approach and I got RddPair with huge size ..
which can not fit into one machine ... what to run foreach
Hi All I am running the below code and its running for very long time where
input to flatMapTopair is record of 50K. and I am calling Hbase for 50K
times just a range scan query to should not take time. can anybody guide me
what is wrong here?
JavaPairRDDVendorRecord, IterableVendorRecord
In this code in foreach I am getting task not serialized exception
@SuppressWarnings(serial)
public static void matchAndMerge(JavaRDDVendorRecord matchRdd, final
JavaSparkContext jsc) throws IOException{
log.info(Company matcher started);
//final JavaSparkContext jsc = getSparkContext();
way to convert the thing into bytes.
On Tue, Mar 31, 2015 at 8:51 PM, Jeetendra Gangele gangele...@gmail.com
wrote:
When I am trying to get the result from Hbase and running mapToPair
function
of RRD its giving the error
java.io.NotSerializableException
(O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com/deanwampler
http://polyglotprogramming.com
On Mon, Apr 6, 2015 at 4:30 PM, Jeetendra Gangele gangele...@gmail.com
wrote:
In this code in foreach I am getting task not serialized exception
@SuppressWarnings(serial)
public
On 7 April 2015 at 04:03, Dean Wampler deanwamp...@gmail.com wrote:
On Mon, Apr 6, 2015 at 6:20 PM, Jeetendra Gangele gangele...@gmail.com
wrote:
Thanks a lot.That means Spark does not support the nested RDD?
if I pass the javaSparkContext that also wont work. I mean passing
SparkContext
[] stopRow) {
Cheers
On Sun, Apr 5, 2015 at 2:35 PM, Jeetendra Gangele gangele...@gmail.com
wrote:
I have 2GB hbase table where this data is store in the form on key and
value(only one column per key) and key also unique
What I thinking to load the complete hbase table into RDD and then do
and firing
query using native client) ?
Thanks
On Sun, Apr 5, 2015 at 2:00 PM, Jeetendra Gangele gangele...@gmail.com
wrote:
Thats true I checked the MultiRowRangeFilter and its serving my need.
do I need to apply the patch? for this since I am using 0.96 hbase
version.
Also I have checked when
Hi
can somebody explain me what is the difference between foreach and
foreachsync over RDD action. which one will give good result maximum
throughput.
does foreach run in parallel way?
://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com/deanwampler
http://polyglotprogramming.com
On Thu, Apr 2, 2015 at 11:33 AM, Jeetendra Gangele gangele...@gmail.com
wrote:
Hi All
Is there an way to make the JavaRDDObject from existing
(false);
scan.setCaching(10);
scan.setBatch(1000);
scan.setSmall(false);
conf.set(TableInputFormat.SCAN, DatabaseUtils.convertScanToString(scan));
return conf;
On 4 April 2015 at 20:54, Jeetendra Gangele gangele...@gmail.com wrote:
Hi All,
Can we get the result of the multiple scan
from
Hi All,
Can we get the result of the multiple scan
from JavaSparkContext.newAPIHadoopRDD from Hbase.
This method first parameter take configuration object where I have added
filter. but how Can I query multiple scan from same table calling this API
only once?
regards
jeetendra
Hi All
I am building a logistic regression for matching the person data lets say
two person object is given with their attribute we need to find the score.
that means at side you have 10 millions records and other side we have 1
record , we need to tell which one match with highest score among 1
1 - 100 of 102 matches
Mail list logo