Phrase Search using Apache Spark in huge amount of text in files

2019-05-28 Thread Sandeep Giri
on this. This is in very early stages and hacky and probably would require more testing. Regards, Sandeep Giri, www.CloudxLab.com <http://www.cloudxlab.com/>

Re: Maintaining overall cumulative data in Spark Streaming

2015-10-30 Thread Sandeep Giri
How to we reset the aggregated statistics to null? Regards, Sandeep Giri, +1 347 781 4573 (US) +91-953-899-8962 (IN) www.KnowBigData.com. <http://KnowBigData.com.> Phone: +1-253-397-1945 (Office) [image: linkedin icon] <https://linkedin.com/company/knowbigdata> [image: other site

Maintaining overall cumulative data in Spark Streaming

2015-10-29 Thread Sandeep Giri
an StreamRDD with aggregated count and keep doing a fullouterjoin but didn't work. Seems like the StreamRDD gets reset. Kindly help. Regards, Sandeep Giri

RE: Maintaining overall cumulative data in Spark Streaming

2015-10-29 Thread Sandeep Giri
Yes, update state by key worked. Though there are some more complications. On Oct 30, 2015 8:27 AM, "skaarthik oss" <skaarthik@gmail.com> wrote: > Did you consider UpdateStateByKey operation? > > > > *From:* Sandeep Giri [mailto:sand...@knowbigdata.com] > *S

Re: MongoDB and Spark

2015-09-11 Thread Sandeep Giri
use map-reduce. On Fri, Sep 11, 2015, 14:32 Mishra, Abhishek wrote: > Hello , > > > > Is there any way to query multiple collections from mongodb using spark > and java. And i want to create only one Configuration Object. Please help > if anyone has something

Re: MongoDB and Spark

2015-09-11 Thread Sandeep Giri
I think it should be possible by loading collections as RDD and then doing a union on them. Regards, Sandeep Giri, +1 347 781 4573 (US) +91-953-899-8962 (IN) www.KnowBigData.com. <http://KnowBigData.com.> Phone: +1-253-397-1945 (Office) [image: linkedin icon] <https://linkedin.co

Re: Spark Interview Questions

2015-08-19 Thread Sandeep Giri
Thank you All. I have updated it to a little better version. Regards, Sandeep Giri, +1 347 781 4573 (US) +91-953-899-8962 (IN) www.KnowBigData.com. http://KnowBigData.com. Phone: +1-253-397-1945 (Office) [image: linkedin icon] https://linkedin.com/company/knowbigdata [image: other site icon

Re: Spark Interview Questions

2015-08-17 Thread Sandeep Giri
This statement is from the Spark's website itself. Regards, Sandeep Giri, +1 347 781 4573 (US) +91-953-899-8962 (IN) www.KnowBigData.com. http://KnowBigData.com. Phone: +1-253-397-1945 (Office) [image: linkedin icon] https://linkedin.com/company/knowbigdata [image: other site icon] http

Re: Spark Interview Questions

2015-07-30 Thread Sandeep Giri
i have prepared some interview questions: http://www.knowbigdata.com/blog/interview-questions-apache-spark-part-1 http://www.knowbigdata.com/blog/interview-questions-apache-spark-part-2 please provide your feedback. On Wed, Jul 29, 2015, 23:43 Pedro Rodriguez ski.rodrig...@gmail.com wrote: You

Re: Data Processing speed SQL Vs SPARK

2015-07-13 Thread Sandeep Giri
Even for 2L records the MySQL will be better. Regards, Sandeep Giri, +1-253-397-1945 (US) +91-953-899-8962 (IN) www.KnowBigData.com. http://KnowBigData.com. [image: linkedin icon] https://linkedin.com/company/knowbigdata [image: other site icon] http://knowbigdata.com [image: facebook icon